59 research outputs found
Algorithmic fairness and bias mitigation for clinical machine learning with deep reinforcement learning
As models based on machine learning continue to be developed for healthcare applications, greater effort is needed to ensure that these technologies do not reflect or exacerbate any unwanted or discriminatory biases that may be present in the data. Here we introduce a reinforcement learning framework capable of mitigating biases that may have been acquired during data collection. In particular, we evaluated our model for the task of rapidly predicting COVID-19 for patients presenting to hospital emergency departments and aimed to mitigate any site (hospital)-specific and ethnicity-based biases present in the data. Using a specialized reward function and training procedure, we show that our method achieves clinically effective screening performances, while significantly improving outcome fairness compared with current benchmarks and state-of-the-art machine learning methods. We performed external validation across three independent hospitals, and additionally tested our method on a patient intensive care unit discharge status task, demonstrating model generalizability
Benchmarking transformer-based models for medical record deidentification: a single centre, multi-specialty evaluation
Background Robust de-identification is necessary to preserve patient confidentiality and maintain public acceptance of electronic health record (EHR) research. Manual redaction of personally identifiable information (PII) outside of structured data is time-consuming and expensive, limiting the scale of data-sharing possible. Automated de-identification (DeID) could alleviate this burden, with competing approaches including task-specific models and generalist large language models (LLMs). We aimed to identify the optimal strategy for PII redaction, evaluating a number of task specific transformer-architecture models and generalist LLMs using no- and low-adaptation techniques.
Methods We evaluated the performance of four task-specific models (Microsoft Azure DeID service, AnonCAT, OBI RoBERTa & BERT i2b2 DeID) and five general-purpose LLMs (Gemma-7b-IT, Llama-3-8B-Instruct, Phi-3-mini-128k-instruct, GPT-3.5-turbo-0125, GPT-4-0125) at de-identifying 3650 medical records from a UK hospital group, split into general and specialised datasets. Records were dual-annotated by clinicians for PII. The primary outcomes were F1 score, precision, and recall for each comparator in classifying words as PII vs. non-PII. The secondary outcomes were performance per-PII-subtype per-dataset, and the Levenshtein distance as a proxy for hallucinations/addition of extra text. We report untuned performance for task-specific models and zero-shot performance for LLMs. To assess sensitivity to data shifts between hospital sites, we undertook concept alignment and fine-tuning of one task-specific model (AnonCAT), and performed few-shot (1, 5, and 10) in-context learning for each LLM using site-specific data.
Results 17496/479760 (3.65%) words were PII. Inter-annotator F1 for word-level PII was 0.977 (95%CI 0.957-0.991). The best performing redaction tool was the Microsoft Azure de-identification service: F1 0.939 (0.934-0.944), precision 0.928 (0.922-0.934), recall 0.950 (0.943-0.958). The next-best tools were fine-tuned-AnonCAT: F1 0.910 (0.905-0.914), precision 0.978 (0.973-0.982), recall 0.850 (0.843-0.858), and GPT-4-0125 (ten-shots): F1 0.898 (0.876-0.915), precision 0.874 (0.834-0.906), recall 0.924 (0.914-0.933). There was hallucinatory output in Phi-3-mini-128k-instruct and Llama-3-8B-Instruct at zero-, one-, and five-shots, and universally for Gemma-7b-IT. AnonCAT showed significant improvement in performance on fine-tuning (F1 increase from 0.851; 0.843-0.859 to 0.910; 0.905-0.914). Names/dates were consistently redacted by all comparators; there was variable performance for other categories. Fine-tuned-AnonCAT demonstrated the least performance shift across datasets.
Conclusion Automated EHR de-identification using transformer models could facilitate large-scale, domain-agnostic record sharing for medical research alongside other safeguards to prevent reidentification. Low-adaptation strategies may improve the performance of generalist LLMs and task-specific models
Theory of disk accretion onto supermassive black holes
Accretion onto supermassive black holes produces both the dramatic phenomena
associated with active galactic nuclei and the underwhelming displays seen in
the Galactic Center and most other nearby galaxies. I review selected aspects
of the current theoretical understanding of black hole accretion, emphasizing
the role of magnetohydrodynamic turbulence and gravitational instabilities in
driving the actual accretion and the importance of the efficacy of cooling in
determining the structure and observational appearance of the accretion flow.
Ongoing investigations into the dynamics of the plunging region, the origin of
variability in the accretion process, and the evolution of warped, twisted, or
eccentric disks are summarized.Comment: Mostly introductory review, to appear in "Supermassive black holes in
the distant Universe", ed. A.J. Barger, Kluwer Academic Publishers, in pres
The neural engine: a reprogrammable low power platform for closed-loop optogenetics
Brain-machine Interfaces (BMI) hold great potential for treating neurological disorders such as epilepsy. Technological progress is allowing for a shift from open-loop, pacemaker-class, intervention towards fully closed-loop neural control systems. Low power programmable processing systems are therefore required which can operate within the thermal window of 2° C for medical implants and maintain long battery life. In this work, we developed a low power neural engine with an optimized set of algorithms which can operate under a power cycling domain. By integrating with custom designed brain implant chip, we have demonstrated the operational applicability to the closed-loop modulating neural activities in in-vitro brain tissues: the local field potentials can be modulated at required central frequency ranges. Also, both a freely-moving non-human primate (24-hour) and a rodent (1-hour) in-vivo experiments were performed to show system long-term recording performance. The overall system consumes only 2.93mA during operation with a biological recording frequency 50Hz sampling rate (the lifespan is approximately 56 hours). A library of algorithms has been implemented in terms of detection, suppression and optical intervention to allow for exploratory applications in different neurological disorders. Thermal experiments demonstrated that operation creates minimal heating as well as battery performance exceeding 24 hours on a freely moving rodent. Therefore, this technology shows great capabilities for both neuroscience in-vitro/in-vivo applications and medical implantable processing units
An assessment of the levels of phthalate esters and metals in the Muledane open dump, Thohoyandou, Limpopo Province, South Africa
<p>Abstract</p> <p>Background</p> <p>This work reports the determination of the levels of phthalate esters (dimethyl phthalate (DMP), diethyl phthalate (DEP), dibutyl phthalate (DBP), diethyl hexyl phthalate (DEHP)) and metals (lead, cadmium, manganese, zinc, iron, calcium) in composite soil samples. The soil samples were collected randomly within the Muledane open dump, Thohoyandou, Limpopo province, South Africa. Control samples were collected about 200 m away from the open dump. The phthalate esters were separated and determined by capillary gas chromatography with a flame ionization detector, whilst the metals were determined by atomic absorption spectrophotometry.</p> <p>Results</p> <p>Open dump values for the phthalate esters and metals to be generally higher in comparison to control samples for DMP, DEP, DBP and DEHP – the mean values calculated were 0.31 ± 0.12, 0.21 ± 0.05, 0.30 ± 0.07, and 0.03 ± 0.01 mg/kg, respectively, for the open dump soil samples. Nonetheless, the mean open dump values for lead, cadmium, manganese, zinc, iron and calcium were 0.07 ± 0.04, 0.003 ± 0.001, 5.02 ± 1.92, 0.31 ± 0.02, 11.62 ± 9.48 and 0.12 ± 0.13 mg/kg, respectively. The results were compared statistically.</p> <p>Conclusion</p> <p>Our results revealed that the discarding of wastes into the open dump is a potential source of soil contamination in the immediate vicinity and beyond, <it>via </it>dispersal. Increased levels of phthalate esters and metals in the soil pose a risk to public health, plants and animals. Sustained monitoring of these contaminants is recommended, in addition to upgrading the facility to a landfill.</p
The impact of surgical delay on resectability of colorectal cancer: An international prospective cohort study
AIM: The SARS-CoV-2 pandemic has provided a unique opportunity to explore the impact of surgical delays on cancer resectability. This study aimed to compare resectability for colorectal cancer patients undergoing delayed versus non-delayed surgery. METHODS: This was an international prospective cohort study of consecutive colorectal cancer patients with a decision for curative surgery (January-April 2020). Surgical delay was defined as an operation taking place more than 4 weeks after treatment decision, in a patient who did not receive neoadjuvant therapy. A subgroup analysis explored the effects of delay in elective patients only. The impact of longer delays was explored in a sensitivity analysis. The primary outcome was complete resection, defined as curative resection with an R0 margin. RESULTS: Overall, 5453 patients from 304 hospitals in 47 countries were included, of whom 6.6% (358/5453) did not receive their planned operation. Of the 4304 operated patients without neoadjuvant therapy, 40.5% (1744/4304) were delayed beyond 4 weeks. Delayed patients were more likely to be older, men, more comorbid, have higher body mass index and have rectal cancer and early stage disease. Delayed patients had higher unadjusted rates of complete resection (93.7% vs. 91.9%, P = 0.032) and lower rates of emergency surgery (4.5% vs. 22.5%, P < 0.001). After adjustment, delay was not associated with a lower rate of complete resection (OR 1.18, 95% CI 0.90-1.55, P = 0.224), which was consistent in elective patients only (OR 0.94, 95% CI 0.69-1.27, P = 0.672). Longer delays were not associated with poorer outcomes. CONCLUSION: One in 15 colorectal cancer patients did not receive their planned operation during the first wave of COVID-19. Surgical delay did not appear to compromise resectability, raising the hypothesis that any reduction in long-term survival attributable to delays is likely to be due to micro-metastatic disease
The V471A polymorphism in autophagy-related gene ATG7 modifies age at onset specifically in Italian Huntington disease patients
The cause of Huntington disease (HD) is a polyglutamine repeat expansion of more than 36 units in the huntingtin protein, which is inversely correlated with the age at onset of the disease. However, additional genetic factors are believed to modify the course and the age at onset of HD. Recently, we identified the V471A polymorphism in the autophagy-related gene ATG7, a key component of the autophagy pathway that plays an important role in HD pathogenesis, to be associated with the age at onset in a large group of European Huntington disease patients. To confirm this association in a second independent patient cohort, we analysed the ATG7 V471A polymorphism in additional 1,464 European HD patients of the “REGISTRY” cohort from the European Huntington Disease Network (EHDN). In the entire REGISTRY cohort we could not confirm a modifying effect of the ATG7 V471A polymorphism. However, analysing a modifying effect of ATG7 in these REGISTRY patients and in patients of our previous HD cohort according to their ethnic origin, we identified a significant effect of the ATG7 V471A polymorphism on the HD age at onset only in the Italian population (327 patients). In these Italian patients, the polymorphism is associated with a 6-years earlier disease onset and thus seems to have an aggravating effect. We could specify the role of ATG7 as a genetic modifier for HD particularly in the Italian population. This result affirms the modifying influence of the autophagic pathway on the course of HD, but also suggests population-specific modifying mechanisms in HD pathogenesis
Deep reinforcement learning for multi-class imbalanced training: applications in healthcare
With the rapid growth of memory and computing power, datasets are becoming increasingly complex and imbalanced. This is especially severe in the context of clinical data, where there may be one rare event for many cases in the majority class. We introduce an imbalanced classification framework, based on reinforcement learning, for training extremely imbalanced data sets, and extend it for use in multi-class settings. We combine dueling and double deep Q-learning architectures, and formulate a custom reward function and episode-training procedure, specifically with the capability of handling multi-class imbalanced training. Using real-world clinical case studies, we demonstrate that our proposed framework outperforms current state-of-the-art imbalanced learning methods, achieving more fair and balanced classification, while also significantly improving the prediction of minority classes
Comparison of ( 1 + α ) Fractional-Order Transfer Functions to Approximate Lowpass Butterworth Magnitude Responses
Artificial intelligence driven assessment of routinely collected healthcare data is an effective screening test for COVID-19 in patients presenting to hospital
AbstractBackgroundRapid identification of COVID-19 is important for delivering care expediently and maintaining infection control. The early clinical course of SARS-CoV-2 infection can be difficult to distinguish from other undifferentiated medical presentations to hospital, however for operational reasons SARS-CoV-2 PCR testing can take up to 48 hours. Artificial Intelligence (AI) methods, trained using routinely collected clinical data, may allow front-door screening for COVID-19 within the first hour of presentation.MethodsDemographic, routine and prior clinical data were extracted for 170,510 sequential presentations to emergency and acute medical departments at a large UK teaching hospital group. We applied multivariate logistic regression, random forests and extreme gradient boosted trees to distinguish emergency department (ED) presentations and admissions due to COVID-19 from pre-pandemic controls. We performed stepwise addition of clinical feature sets and assessed performance using stratified 10-fold cross validation. Models were calibrated during training to achieve sensitivities of 70, 80 and 90% for identifying patients with COVID-19. To simulate real-world performance at different stages of an epidemic, we generated test sets with varying prevalences of COVID-19 and assessed predictive values. We prospectively validated our models for all patients presenting or admitted to our hospital group between 20th April and 6th May 2020, comparing model predictions to PCR test results.ResultsPresentation laboratory blood tests, point of care blood gas, and vital signs measurements for 115,394 emergency presentations and 72,310 admissions were analysed. Presentation laboratory tests and vital signs were most predictive of COVID-19 (maximum area under ROC curve [AUROC] 0.904 and 0.823, respectively). Sequential addition of informative variables improved model performance to AUROC 0.942.We developed two early-detection models to identify COVID-19, achieving sensitivities and specificities of 77.4% and 95.7% for our ED model amongst patients attending hospital, and 77.4% and 94.8% for our Admissions model amongst patients being admitted. Both models offer high negative predictive values (>99%) across a range of prevalences (<5%). In a two-week prospective validation period, our ED and Admissions models demonstrated 92.3% and 92.5% accuracy (AUROC 0.881 and 0.871 respectively) for all patients presenting or admitted to a large UK teaching hospital group. A sensitivity analysis to account for uncertainty in negative PCR results improves apparent accuracy (95.1% and 94.1%) and NPV (99.0% and 98.5%). Three laboratory blood markers, Eosinophils, Basophils, and C-Reactive Protein, alongside Calcium measured on blood-gas, and presentation Oxygen requirement were the most informative variables in our models.ConclusionArtificial intelligence techniques perform effectively as a screening test for COVID-19 in emergency departments and hospital admission units. Our models support rapid exclusion of the illness using routinely collected and readily available clinical measurements, guiding streaming of patients during the early phase of admission.BriefThe early clinical course of SARS-CoV-2 infection can be difficult to distinguish from other undifferentiated medical presentations to hospital, however viral specific real-time polymerase chain reaction (RT-PCR) testing has limited sensitivity and can take up to 48 hours for operational reasons. In this study, we develop two early-detection models to identify COVID-19 using routinely collected data typically available within one hour (laboratory tests, blood gas and vital signs) during 115,394 emergency presentations and 72,310 admissions to hospital. Our emergency department (ED) model achieved 77.4% sensitivity and 95.7% specificity (AUROC 0.939) for COVID-19 amongst all patients attending hospital, and Admissions model achieved 77.4% sensitivity and 94.8% specificity (AUROC 0.940) for the subset admitted to hospital. Both models achieve high negative predictive values (>99%) across a range of prevalences (<5%), facilitating rapid exclusion during triage to guide infection control. We prospectively validated our models across all patients presenting and admitted to a large UK teaching hospital group in a two-week test period, achieving 92.3% (n= 3,326, NPV: 97.6%, AUROC: 0.881) and 92.5% accuracy (n=1,715, NPV: 97.7%, AUROC: 0.871) in comparison to RT-PCR results. Sensitivity analyses to account for uncertainty in negative PCR results improves apparent accuracy (95.1% and 94.1%) and NPV (99.0% and 98.5%). Our artificial intelligence models perform effectively as a screening test for COVID-19 in emergency departments and hospital admission units, offering high impact in settings where rapid testing is unavailable.</jats:sec
- …
