183 research outputs found

    Validation & Vindication – Comparing Electronic Health Records with Hospital Notes

    Get PDF
    Introduction No doubt your Electronic Health Records have been meticulously gathered, imported, validated and standardised. However, if you want to be certain that they are an accurate representation of reality, you can’t beat physically going to hospitals and cross-checking their records against yours. Our biobank did exactly this. Objectives and Approach Our validation exercise encompassed all reported cases in our follow-up data of three key conditions: stroke, heart disease, and cancer. Key data about each hospitalisation was extracted and exported to tablet computers running custom software. Our staff then visited each hospital in this dataset seeking the corresponding medical notes, and collected additional data from those that they found including photographs of key documents. These results were then adjudicated by specialist physicians to determine the accuracy of the diagnosis, and identify disease phenotypes of interest. Finally, all these results were merged back into our follow-up data. Results Not only was gathering the data a huge logistical and technical challenge, integrating it back into the database presented its own difficulties. Our initial plan was to assign each sought event a status of ‘validated’, ‘corrected’ or ‘unfound’. However, this proved inadequate for addressing the complexities of the data, as we will discuss, with examples. Our solution was to initially treat the retrieved hospital notes as simply another source of follow-up data. We were thus able to use our existing systems for validating, standardising and aggregating events; and thus produce validated endpoints that were meaningfully comparable to our reported endpoints. We could then implement and test definitions of the required validation statuses at a participant level for each disease of interest. Conclusion/Implications This validation project was a huge and daunting undertaking, but repaid our investment with proof that our Electronic Health Records were generally very reliable, and also with much richer data about disease diagnosis and phenotyping. Other projects using Electronic Health Records may wish to adopt this approach

    Meta Matters - Enriching & Exploiting Your Metadata

    Get PDF
    Introduction Data is nothing without context: if you don't know how, when or why a variable was gathered, it's nigh impossible to draw conclusions from it. This presentation discusses different sorts of metadata and how they can be gathered, stored, and used to enrich data; drawing examples from our biobank. Objectives and Approach Each data item has two types of metadata: variable-level and value-level. For example, consider a questionnaire. The variable-level metadata covers each question: exact wording, validation rules for the answers, etc. The value-level metadata covers each individual answer: details of the questioner, date and time of response, and so on. We also have database-level metadata: datasets which list every dataset or every field in the database. While some of this information needs to be gathered alongside the data itself, much can be extracted or imputed from results or documentation. We present some generalizable examples. Results Like any other data, metadata is only worth having if you’re using it. We will present principles and examples of applications that we have developed for it: • Data management – Deriving useful variables and tables, and helping to make your data easier to parse, extract, and validate. • Presentation – Making your data more human-readable by labelling variables and decoding values. • Documentation – Metadata tables make ideal repositories for granular institutional knowledge about your data: known issues, potential pitfalls, or explanations for missing values. • Analysis – Identifying which metadata variables are most valuable for analysts, and how best to provide them. • Automation – Using the metadata to generate code that can automatically produce summary statistics, tables, graphs… and more metadata! Conclusion/Implications Every dataset comes with some metadata. When examined and built upon, it can deepen understanding of the data within, as well as becoming a powerful resource in its own right

    Using routine data to monitor inequalities in an acute trust: a retrospective study

    Get PDF
    <p><b>Abstract</b></p> <p><b>Background</b></p> <p>Reducing inequalities is one of the priorities of the National Health Service. However, there is no standard system for monitoring inequalities in the care provided by acute trusts. We explore the feasibility of monitoring inequalities within an acute trust using routine data.</p> <p><b>Methods</b></p> <p>A retrospective study of hospital episode statistics from one acute trust in London over three years (2007 to 2010). Waiting times, length of stay and readmission rates were described for seven common surgical procedures. Inequalities by age, sex, ethnicity and social deprivation were examined using multiple logistic regression, adjusting for the other socio-demographic variables and comorbidities. Sample size calculations were computed to estimate how many years of data would be ideal for this analysis.</p> <p><b>Results</b></p> <p>This study found that even in a large acute trust, there was not enough power to detect differences between subgroups. There was little evidence of inequalities for the outcome and process measures examined, statistically significant differences by age, sex, ethnicity or deprivation were only found in 11 out of 80 analyses. Bariatric surgery patients who were black African or Caribbean were more likely than white patients to experience a prolonged wait (longer than 64 days, aOR = 2.47, 95% CI: 1.36-4.49). Following a coronary angioplasty, patients from more deprived areas were more likely to have had a prolonged length of stay (aOR = 1.66, 95% CI: 1.25-2.20).</p> <p><b>Conclusions</b></p> <p>This study found difficulties in using routine data to identify inequalities on a trust level. Little evidence of inequalities in waiting time, length of stay or readmission rates by sex, ethnicity or social deprivation were identified although some differences were identified which warrant further investigation. Even with three years of data from a large trust there was little power to detect inequalities by procedure. Data will therefore need to be pooled from multiple trusts to detect inequalities.</p

    Genotyping and population characteristics of the China Kadoorie Biobank

    Get PDF
    The China Kadoorie Biobank (CKB) is a population-based prospective cohort of >512,000 adults recruited from 2004 to 2008 from 10 geographically diverse regions across China. Detailed data from questionnaires and physical measurements were collected at baseline, with additional measurements at three resurveys involving ∼5% of surviving participants. Analyses of genome-wide genotyping, for >100,000 participants using custom-designed Axiom arrays, reveal extensive relatedness, recent consanguinity, and signatures reflecting large-scale population movements from recent Chinese history. Systematic genome-wide association studies of incident disease, captured through electronic linkage to death and disease registries and to the national health insurance system, replicate established disease loci and identify 14 novel disease associations. Together with studies of candidate drug targets and disease risk factors and contributions to international genetics consortia, these demonstrate the breadth, depth, and quality of the CKB data. Ongoing high-throughput omics assays of collected biosamples and planned whole-genome sequencing will further enhance the scientific value of this biobank

    Long-term ambient air pollution exposure and cardio-respiratory disease in China: findings from a prospective cohort study

    Get PDF
    Background Existing evidence on long-term ambient air pollution (AAP) exposure and risk of cardio-respiratory diseases in China is mainly on mortality, and based on area average concentrations from fixed-site monitors for individual exposures. Substantial uncertainty persists, therefore, about the shape and strength of the relationship when assessed using more personalised individual exposure data. We aimed to examine the relationships between AAP exposure and risk of cardio-respiratory diseases using predicted local levels of AAP. Methods A prospective study included 50,407 participants aged 30–79 years from Suzhou, China, with concentrations of nitrogen dioxide (NO2), sulphur dioxide (SO2), fine (PM2.5), and inhalable (PM10) particulate matter, ozone (O3) and carbon monoxide (CO) and incident cases of cardiovascular disease (CVD) (n = 2,563) and respiratory disease (n = 1,764) recorded during 2013–2015. Cox regression models with time-dependent covariates were used to estimate adjusted hazard ratios (HRs) for diseases associated with local-level concentrations of AAP exposure, estimated using Bayesian spatio–temporal modelling. Results The study period of 2013–2015 included a total of 135,199 person-years of follow-up for CVD. There was a positive association of AAP, particularly SO2 and O3, with risk of major cardiovascular and respiratory diseases. Each 10 µg/m3 increase in SO2 was associated with adjusted hazard ratios (HRs) of 1.07 (95% CI: 1.02, 1.12) for CVD, 1.25 (1.08, 1.44) for COPD and 1.12 (1.02, 1.23) for pneumonia. Similarly, each 10 µg/m3 increase in O3 was associated with adjusted HR of 1.02 (1.01, 1.03) for CVD, 1.03 (1.02, 1.05) for all stroke, and 1.04 (1.02, 1.06) for pneumonia. Conclusions Among adults in urban China, long-term exposure to ambient air pollution is associated with a higher risk of cardio-respiratory disease

    Heterogeneity in the diagnosis and prognosis of ischemic stroke subtypes: 9-year follow-up of 22,000 cases in Chinese adults

    Get PDF
    Background: Reliable classification of ischemic stroke (IS) etiological subtypes is required in research and clinical practice, but the predictive properties of these subtypes in population studies with incomplete investigations are poorly understood. Aims: To compare the prognosis of etiologically classified IS subtypes and use machine learning (ML) to classify incompletely investigated IS cases. Methods: In a 9-year follow-up of a prospective study of 512,726 Chinese adults, 22,216 incident IS cases, confirmed by clinical adjudication of medical records, were assigned subtypes using a modified Causative Classification System for Ischemic Stroke (CCS) (large artery atherosclerosis (LAA), small artery occlusion (SAO), cardioaortic embolism (CE), or undetermined etiology) and classified by CCS as “evident,” “probable,” or “possible” IS cases. For incompletely investigated IS cases where CCS yielded an undetermined etiology, an ML model was developed to predict IS subtypes from baseline risk factors and screening for cardioaortic sources of embolism. The 5-year risks of subsequent stroke and all-cause mortality (measured using cumulative incidence functions and 1 minus Kaplan–Meier estimates, respectively) for the ML-predicted IS subtypes were compared with etiologically classified IS subtypes. Results: Among 7443 IS subtypes with evident or probable etiology, 66% had SAO, 32% had LAA, and 2% had CE, but proportions of SAO-to-LAA cases varied by regions in China. CE had the highest rates of subsequent stroke and mortality (43.5% and 40.7%), followed by LAA (43.2% and 17.4%) and SAO (38.1% and 11.1%), respectively. ML provided classifications for cases with undetermined etiology and incomplete clinical data (24% of all IS cases; n = 5276), with area under the curves (AUC) of 0.99 (0.99–1.00) for CE, 0.67 (0.64–0.70) for LAA, and 0.70 (0.67–0.73) for SAO for unseen cases. ML-predicted IS subtypes yielded comparable subsequent stroke and all-cause mortality rates to the etiologically classified IS subtypes. Conclusion: This study highlighted substantial heterogeneity in prognosis of IS subtypes and utility of ML approaches for classification of IS cases with incomplete clinical investigations

    A phenome-wide association study of a lipoprotein associated phospholipase A2 loss-of-function variant in 90 000 Chinese adults

    Get PDF
    Background Lipoprotein-associated phospholipase A2 (Lp-PLA2) has been implicated in development of atherosclerosis, however, recent randomised trials of Lp-PLA2 inhibition reported no beneficial effects on vascular diseases. In East Asians, a loss-of-function variant in the PLA2G7 gene can be used to assess the effects of genetically-determined lower Lp-PLA2. Methods PLA2G7V279F (rs76863441) was genotyped in 91 428 individuals randomly-selected from the China Kadoorie Biobank of 0.5M participants recruited in 2004-2008from 10 regions of China, with seven years follow-up. Linear regression was used to assess effects of V279F on baseline traits. Logistic regression was conducted for a range of vascular and non-vascular diseases, including 41 ICD-10 coded disease categories. Results PLA2G7V279F frequency was 5% overall (range 3-7% by region), and 9,691 (11%) participants had at least one loss-of-function variant. V279F was not associatedwith baseline blood pressure, adiposity, blood glucose, or lung function. V279F was not associated with major vascular events (7141 events; OR=0.98 per F variant, 95% CI 0.90-1.06), or other vascular outcomes, including major coronary events (922 events; 0.96, 0.79-1.18) andstroke (5967 events; 1.00, 0.92-1.09). Individuals with V279F had lower risks of diabetes (7031 events; 0.91, 0.84-0.98) and asthma (182 events; 0.53, 0.28-0.98), but there was no association after adjustment for multiple testing. Conclusions Lifelong lower Lp-PLA2 activity was not associated with major risks of vascular or non-vascular diseases in Chinese adults. Using functional genetic variants in large-scale prospective studies with linkage to a range of health outcomes is a valuable approach to inform drug development and repositioning

    Changes in SARS-CoV-2 Spike versus Nucleoprotein Antibody Responses Impact the Estimates of Infections in Population-Based Seroprevalence Studies

    Get PDF
    SARS-CoV-2-specific antibody responses to the Spike (S) protein monomer, S protein native trimeric form or the nucleocapsid (N) proteins were evaluated in cohorts of individuals with acute infection (n=93) and in individuals enrolled in a post-infection seroprevalence population study (n=578) in Switzerland. Commercial assays specific for the S1 monomer, for the N protein and a newly developed Luminex assay using the S protein trimer were found to be equally sensitive in antibody detection in the acute infection phase samples. Interestingly, as compared to anti-S antibody responses, those against the N protein appear to wane in the post-infection cohort. Seroprevalence in a 'positive patient contacts' group (n=177) was underestimated by N protein assays by 10.9 to 32.2% and the 'random selected' general population group (n=311) was reduced up to 45% reduction relative to S protein assays. The overall reduction in seroprevalence targeting only anti-N antibodies for the total cohort ranged from 9.4 to 31%. Of note, the use of the S protein in its native trimer form was significantly more sensitive as compared to monomeric S proteins. These results indicate that the assessment of anti-S IgG antibody responses against the native trimeric S protein should be implemented to estimate SARS-CoV-2 infections in population-based seroprevalence studies.IMPORTANCE In the present study, we have determined SARS-CoV-2-specific antibody responses in sera of acute and post-infection phase subjects. Our results indicate that antibody responses against viral S and N proteins were equally sensitive in the acute phase of infection but that responses against N appear to wane in the post-infection phase while those against S protein persist over time. The most sensitive serological assay in both acute and post-infection phases used the native S protein trimer as binding antigen that has significantly greater conformational epitopes for antibody binding compared to the S1 monomer protein used in other assays. We believe that these results are extremely important in order to generate correct estimates of SARS-CoV-2 infections in the general population. Furthermore, the assessment of antibody responses against the trimeric S protein will be critical to evaluate the durability of the antibody response and for the characterization of a vaccine-induced antibody response

    The burden of disease profile of residents of Nairobi's slums: Results from a Demographic Surveillance System

    Get PDF
    BACKGROUND: With increasing urbanization in sub-Saharan Africa and poor economic performance, the growth of slums is unavoidable. About 71% of urban residents in Kenya live in slums. Slums are characteristically unplanned, underserved by social services, and their residents are largely underemployed and poor. Recent research shows that the urban poor fare worse than their rural counterparts on most health indicators, yet much about the health of the urban poor remains unknown. This study aims to quantify the burden of mortality of the residents in two Nairobi slums, using a Burden of Disease approach and data generated from a Demographic Surveillance System. METHODS: Data from the Nairobi Urban Health and Demographic Surveillance System (NUHDSS) collected between January 2003 and December 2005 were analysed. Core demographic events in the NUHDSS including deaths are updated three times a year; cause of death is ascertained by verbal autopsy and cause of death is assigned according to the ICD 10 classification. Years of Life Lost due to premature mortality (YLL) were calculated by multiplying deaths in each subcategory of sex, age group and cause of death, by the Global Burden of Disease standard life expectancy at that age. RESULTS: The overall mortality burden per capita was 205 YLL/1,000 person years. Children under the age of five years had more than four times the mortality burden of the rest of the population, mostly due to pneumonia and diarrhoeal diseases. Among the population aged five years and above, HIV/AIDS and tuberculosis accounted for about 50% of the mortality burden. CONCLUSION: Slum residents in Nairobi have a high mortality burden from preventable and treatable conditions. It is necessary to focus on these vulnerable populations since their health outcomes are comparable to or even worse than the health outcomes of rural dwellers who are often the focus of most interventions
    corecore