64 research outputs found

    Accuracy of identifying incident stroke cases from linked healthcare data in UK Biobank

    No full text
    Objective: In UK Biobank (UKB), a large population-based prospective study, cases of many diseases are ascertained through linkage to routinely collected, coded national health datasets. We assessed the accuracy of these for identifying incident strokes. Methods: In a regional UKB sub-population (n=17,249), we identified all participants with ≥1 code signifying a first stroke after recruitment (incident stroke-coded cases) in linked hospital admission, primary care or death record data. Stroke physicians reviewed their full electronic patient records (EPRs) and generated reference standard diagnoses. We evaluated the number and proportion of cases that were true positives (i.e. positive predictive value, PPV) for all codes combined and by code source and type. Results: Of 232 incident stroke-coded cases, 97% had EPR information available. Data sources were: 30% hospital admission only; 39% primary care only; 28% hospital and primary care; 3% death records only. While 42% of cases were coded as unspecified stroke type, review of EPRs enabled a pathological type to be assigned in >99%. PPVs (95% confidence intervals) were: 79% (73%-84%) for any stroke (89% for hospital admission codes, 80% for primary care codes) and 83% (74%-90%) for ischemic stroke. PPVs for small numbers of death record and hemorrhagic stroke codes were low but imprecise. Conclusions: Stroke and ischemic stroke cases in UKB can be ascertained through linked health datasets with sufficient accuracy for many research studies. Further work is needed to understand the accuracy of death record and hemorrhagic stroke codes and to develop scalable approaches for better identifying stroke types.Rannikmae, Kristiina (2021). Accuracy of identifying incident stroke cases from linked healthcare data in UK Biobank [Dataset]. Dryad. https://doi.org/10.5061/dryad.w9ghx3fk

    Register for Asthma Research (REACH) database

    No full text
    The dataset is archived in the Edinburgh DataVault. Access is restricted to authorised University of Edinburgh users only.REACH database contains information of people across the UK who have asthma and agreed to be contacted about potential asthma research.Jackson, T "Register for Asthma Research (REACH) database" [dataset] (2020) https://doi.org/10.7488/aa64a408-a275-43aa-af69-3da059be271

    DataVault: Baillie Lab baillie_fantom5 'Vault B'

    No full text
    ## Access ## This dataset is held in the Edinburgh DataVault, directly accessible only to authorised University of Edinburgh staff. External users may request access to a copy of the data by contacting the Principal Investigator, Contact Person or Data Manager named on this page. University of Edinburgh users who wish to have direct access should consult the information about retrieving data from the DataVault at: https://www.ed.ac.uk/is/research-support/datavault .Baillie legacy fantom5 data from lx05 archived 202

    RNA composition metrics

    No full text
    1) Supplementary File 1: Nucleotide sequences of mutants used in the study Supplementary File 1.doc 2) Composition and codon usage metrics for human mRNA Human mRNA composition.xls 3) Composition and codon usage metrics for A. thaliana mRNA A.thaliana Composition table.xls 4) Composition and codon usage metrics for C. elegans mRNA C.elegans_Composition table.xls 5) Composition and codon usage metrics for E. coli mRNA E.coli_Composition table.xls 6) Codon pair usage tables for human, A. thaliana, C. elegans and E. coli. CPS tables.xlsMutating RNA virus genomes to alter codon pair (CP) frequencies and reduce translation efficiency has been advocated as a method to generate safe, attenuated virus vaccines. However, selection for disfavoured CPs leads to unintended increases in CpG and UpA dinucleotide frequencies that also attenuate replication. We designed and phenotypically characterised mutants of the picornavirus, echovirus 7, in which these parameters were independently varied to determine which most influenced virus replication. CpG and UpA dinucleotide frequencies primarily influenced virus replication ability while no fitness differences were observed between mutants with different CP usage where dinucleotide frequencies were kept constant. Contrastingly, translation efficiency was unaffected by either CP usage or dinucleotide frequencies. This mechanistic insight is critical for future rational design of live virus vaccines and their safety evaluation; attenuation is mediated through enhanced innate immune responses to viruses with elevated CpG/UpA dinucleotide frequencies rather the viruses themselves being intrinsically defective.Simmonds, Peter. (2014). RNA composition metrics, 2014 [dataset]. University of Edinburgh. Roslin Institute. http://dx.doi.org/10.7488/ds/188

    Classification of Gastroparesis from Glycemic Variability in Type 1 Diabetes: A Proof-of-Concept Study

    No full text
    Background and Objective:Delayed gastric emptying is a substantial challenge for people with diabetes, affecting quality of life and blood glucose regulation. The complication is underdiagnosed, and current diagnostic tests are expensive or time consuming or have modest accuracy. The assessment of glycemic variations has potential use in gastroparesis screening. The aim of this study was to investigate the differences in glycemic variability between type 1 diabetes patients with gastroparesis and without a diagnosis of gastroparesis and the potential for using a classification model to differentiate between groups.Methods:Continuous glucose monitoring (CGM) from 425 patients with diabetes was included in the analytic cohort, including 16 patients with a diagnosis of gastroparesis and 409 without a known gastroparesis diagnosis. Sixteen features (9 daytime features and 7 nighttime features) describing glucose dynamics were extracted to assess differences between patients with and without a diagnosis of gastroparesis. A logistic regression model was trained using forward selection and cross-validation.Results:In total, 3 features were included in the model utilizing forward selection of features and cross-validation: mean absolute glucose (MAG), span, and standard deviation during the night. The Receiver operating characteristic (ROC) AUC for the classification model was 0.76.Conclusions:Gastroparesis seems to have an impact on glucose variability, especially during the night. Moreover, CGM could possibly be used as a part of the screening process for delayed gastric emptying, but more studies are needed to determine a realistic accuracy

    PyEHR: A Predictive Modeling Toolkit for Electronic Health Records

    No full text
    The repository (GitHub: https://github.com/yhzhu99/pyehr) is a practical implementation of the arXiv paper: "A Comprehensive Benchmark for COVID-19 Predictive Modeling Using Electronic Health Records in Intensive Care" The repository includes various machine learning and deep learning models implemented for predictive modeling tasks using Electronic Health Records (EHR) specifically for COVID-19 patients in Intensive Care Units (ICU). Benchmarking results from two real-world COVID-19 EHR datasets (TJH and CDSL datasets) are also provided. All results and trained models are freely accessible on our online platform PyEHR, and the source code can be found here. We invite clinicians, researchers, and data scientists to contribute to this growing platform. Prediction Tasks The following prediction tasks have been implemented in this repository: Mortality outcome prediction (Early) Length-of-stay prediction Multi-task/Two-stage prediction (predict mortality outcome and length-of-stay simultaneously) Model Zoo The repository contains a variety of models from traditional machine learning, basic deep learning, and advanced deep learning models tailored for EHR data: Machine Learning Models Random forest (RF) Decision tree (DT) Gradient Boosting Decision Tree (GBDT) XGBoost CatBoost Deep Learning Models Multi-layer perceptron (MLP) Recurrent neural network (RNN) Long-short term memory network (LSTM) Gated recurrent units (GRU) Temporal convolutional networks Transformer EHR Predictive Models RETAIN StageNet Dr. Agent AdaCare ConCare GRASP The best searched hyperparameters for each model are meticulously preserved in the configs/ folder (dl.py and ml.py). ️ Repository Structure The code repository includes the following directory structure: pyehr/ ├── losses/ # contains losses designed for the tasks ├── metrics/ # contains metrics for tasks ├── models/ # backbone models ML or DL models ├── configs/ # contains configs of best searched hyperparameters and dataset related configs ├── datasets/ # contains datasets and pre-process scripts ├── pipelines/ # deep learning or machine learning pipeline under pytorch lightning framework ├── tune.py # do hyper-parameter search with WandB ├── train.py # train models ├── test.py # test the models └── requirements.txt # code dependencies ️ Data Format The inputs fed to the pipelines should have the following data format: x.pkl: (N, T, D) List, where N is the number of patients, T is the number of time steps, and D is the number of features. At D dimension, the first x features are demographic features, the next y features are lab test features, where x + y = D y.pkl: (N, T, 2) List, where the 2 values are [outcome, length-of-stay] for each time step. los_info.pkl: a dictionary contains length-of-stay related statatistics. E.g. mean and std of the los values. Since we have done z-score normalization to the los labels, these stats are essential to reverse the raw los values. ⚙️ Requirements To get started with the repository, ensure your environment meets the following requirements: Python 3.8+ PyTorch 2.0 (use Lightning AI) See requirements.txt for additional dependencies. Usage To start with the data pre-precessing steps, follow the instructions: Download TJH dataset from paper An interpretable mortality prediction model for COVID-19 patients, and you need to apply for the CDSL dataset if necessary. COVID Data Save Lives Dataset Run the pre-processing scripts preprocess_{dataset}.ipynb in datasets/ folder. Then you will have the 10-fold processed datasets in the required data format. To start with the training or testing, use the following commands: # Hyperparameter tuning python dl_tune.py # for deep learning models python ml_tune.py # for machine learning models # Model training python train.py # Model testing python test.pyZhu, Y., Wang, W., Gao, J., & Ma, L. (2024). PyEHR: A Predictive Modeling Toolkit for Electronic Health Records. Zenodo. https://doi.org/10.5281/zenodo.1057356

    Blood flow through compressed networks - critical bifurcations

    No full text
    The tumour microenvironment is abnormal and one of its consequences is that blood vessels are compressed. Vessel compression correlates with reduced survival rates, while decompression of vessels improves tissue oxygenation as well as increases survival rates. It was previously shown that vessel compression contributes, at a single vascular bifurcation, to the increase of heterogeneity of red blood cell transport. This dataset contains the result of a simulation of blood flow (Poiseuille flow) through a compressed network, showing the effect of vessel compression on haematocrit distribution. It additionally contains the critical bifurcations in the network that are responsible for diverting blood flow away from the compressed region

    [CODE] COVID-19 EHR Benchmark - Online Platform

    No full text
    The repository (GitHub: https://github.com/yhzhu99/pyehr-playground) is a practical implementation of the arXiv paper: "A Comprehensive Benchmark for COVID-19 Predictive Modeling Using Electronic Health Records in Intensive Care" Introduction The COVID-19 EHR Benchmark is a comprehensive platform for predictive modeling using Electronic Health Records (EHR) in Intensive Care Units (ICUs). This benchmarking effort is the first of its kind for patient-level COVID-19 prediction tasks in ICUs. The main goal is to provide a reliable and standardized benchmark for researchers and practitioners in the field. Features Performance Table: Provides the average score across all folds. View Here Detailed Performance Table: Includes comprehensive performance details for each fold. View Here Checkpoints & Logs: Access all model checkpoints and logs. Access via Google Drive Background To the best of our knowledge, this is the first benchmarking effort for patient-level COVID-19 prediction tasks in ICUs. We have made our code publicly available, enabling others to build complete benchmarks and reproduce all results. Our well-structured data preprocessing and modeling modules can also be easily applied to generate customized tasks and results. The benchmark code and documentations can be accessed at GitHub Repository. Moreover, we have released all the benchmark experiment results and trained models on this online platform, which includes model performances with all hyperparameter combinations for both tasks and makes the results easy to query and download. Further details on data and code availability, including the benchmark code structure, are provided in Supplementary Materials. Access The platform can be accessed at https://pyehr.netlify.app.Zhu, Y., Wang, W., Gao, J., & Ma, L. (2024). COVID-19 EHR Benchmark - Online Platform. Zenodo. https://doi.org/10.5281/zenodo.1057364
    corecore