132 research outputs found

    Phenotype prediction using biologically interpretable neural networks on multi-cohort multi-omics data

    Get PDF
    Integrating multi-omics data into predictive models has the potential to enhance accuracy, which is essential for precision medicine. In this study, we developed interpretable predictive models for multi-omics data by employing neural networks informed by prior biological knowledge, referred to as visible networks. These neural networks offer insights into the decision-making process and can unveil novel perspectives on the underlying biological mechanisms associated with traits and complex diseases. We tested the performance, interpretability and generalizability for inferring smoking status, subject age and LDL levels using genome-wide RNA expression and CpG methylation data from the blood of the BIOS consortium (four population cohorts, Ntotal = 2940). In a cohort-wise cross-validation setting, the consistency of the diagnostic performance and interpretation was assessed. Performance was consistently high for predicting smoking status with an overall mean AUC of 0.95 (95% CI: 0.90–1.00) and interpretation revealed the involvement of well-replicated genes such as AHRR, GPR15 and LRRN3. LDL-level predictions were only generalized in a single cohort with an R2 of 0.07 (95% CI: 0.05–0.08). Age was inferred with a mean error of 5.16 (95% CI: 3.97–6.35) years with the genes COL11A2, AFAP1, OTUD7A, PTPRN2, ADARB2 and CD34 consistently predictive. For both regression tasks, we found that using multi-omics networks improved performance, stability and generalizability compared to interpretable single omic networks. We believe that visible neural networks have great potential for multi-omics analysis; they combine multi-omic data elegantly, are interpretable, and generalize well to data from different cohorts

    Triangulating molecular evidence to prioritize candidate causal genes at established atopic dermatitis loci

    Get PDF
    Genome-wide association studies for atopic dermatitis (AD) have identified 25 reproducible loci. We attempt to prioritize candidate causal genes at these loci using extensive molecular resources compiled into a bioinformatics pipeline. We identified a list of 103 molecular resources for AD aetiology, including expression, protein and DNA methylation QTL datasets in skin or immune-relevant tissues which were tested for overlap with GWAS signals. This was combined with functional annotation using regulatory variant prediction, and features such as promoter-enhancer interactions, expression studies and variant fine-mapping. For each gene at each locus, we condensed the evidence into a prioritization score. Across the investigated loci, we detected significant enrichment of genes with adaptive immune regulatory function and epidermal barrier formation among the top prioritized genes. At 8 loci, we were able to prioritize a single candidate gene (IL6R, ADO, PRR5L, IL7R, ETS1, INPP5D, MDM1, TRAF3). In addition, at 6 of the 25 loci, our analysis prioritizes less familiar candidates (SLC22A5, IL2RA, MDM1, DEXI, ADO, STMN3). Our analysis provides support for previously implicated genes at several AD GWAS loci, as well as evidence for plausible additional candidates at others, which may represent potential targets for drug discovery

    Mendelian randomization integrating GWAS and eQTL data reveals genetic determinants of complex and clinical traits

    Get PDF
    Genome-wide association studies (GWAS) have identified thousands of variants associated with complex traits, but their biological interpretation often remains unclear. Most of these variants overlap with expression QTLs, indicating their potential involvement in regulation of gene expression. Here, we propose a transcriptome-wide summary statistics-based Mendelian Randomization approach (TWMR) that uses multiple SNPs as instruments and multiple gene expression traits as exposures, simultaneously. Applied to 43 human phenotypes, it uncovers 3,913 putatively causal gene-trait associations, 36% of which have no genome-wide significant SNP nearby in previous GWAS. Using independent association summary statistics, we find that the majority of these loci were missed by GWAS due to power issues. Noteworthy among these links is educational attainment-associated BSCL2, known to carry mutations leading to a Mendelian form of encephalopathy. We also find pleiotropic causal effects suggestive of mechanistic connections. TWMR better accounts for pleiotropy and has the potential to identify biological mechanisms underlying complex traits.Peer reviewe

    Epigenome-wide meta-analysis of DNA methylation and childhood asthma

    Get PDF
    Background: Epigenetic mechanisms, including methylation, can contribute to childhood asthma. Identifying DNA methylation profiles in asthmatic patients can inform disease pathogenesis. Objective: We sought to identify differential DNA methylation in newborns and children related to childhood asthma. Methods: Within the Pregnancy And Childhood Epigenetics consortium, we performed epigenome-wide meta-analyses of school-age asthma in relation to CpG methylation (Illumina450K) in blood measured either in newborns, in prospective analyses, or cross-sectionally in school-aged children. We also identified differentially methylated regions. Results: In newborns (8 cohorts, 668 cases), 9 CpGs (and 35 regions) were differentially methylated (epigenome-wide significance, false discovery rate <0.05) in relation to asthma development. In a cross-sectional meta-analysis of asthma and methylation in children (9 cohorts, 631 cases), we identified 179 CpGs (false discovery rate <0.05) and 36 differentially methylated regions. In replication studies of methylation in other tissues, most of the 179 CpGs discovered in blood replicated, despite smaller sample sizes, in studies of nasal respiratory epithelium or eosinophils. Pathway analyses highlighted enrichment for asthma-relevant immune processes and overlap in pathways enriched both in newborns and children. Gene expression correlated with methylation at most loci. Functional annotation supports a regulatory effect on gene expression at many asthma-associated CpGs. Several implicated genes are targets for approved or experimental drugs, including IL5RA and KCNH2. Conclusion: Novel loci differentially methylated in newborns represent potential biomarkers of risk of asthma by school age. Cross-sectional associations in children can reflect both risk for and effects of disease. Asthma-related differential methylation in blood in children was substantially replicated in eosinophils and respiratory epithelium.Peer reviewe

    Genomic and phenotypic insights from an atlas of genetic effects on DNA methylation

    Get PDF
    DNA methylation quantitative trait locus (mQTL) analyses on 32,851 participants identify genetic variants associated with DNA methylation at 420,509 sites in blood, resulting in a database of >270,000 independent mQTLs. Characterizing genetic influences on DNA methylation (DNAm) provides an opportunity to understand mechanisms underpinning gene regulation and disease. In the present study, we describe results of DNAm quantitative trait locus (mQTL) analyses on 32,851 participants, identifying genetic variants associated with DNAm at 420,509 DNAm sites in blood. We present a database of >270,000 independent mQTLs, of which 8.5% comprise long-range (trans) associations. Identified mQTL associations explain 15-17% of the additive genetic variance of DNAm. We show that the genetic architecture of DNAm levels is highly polygenic. Using shared genetic control between distal DNAm sites, we constructed networks, identifying 405 discrete genomic communities enriched for genomic annotations and complex traits. Shared genetic variants are associated with both DNAm levels and complex diseases, but only in a minority of cases do these associations reflect causal relationships from DNAm to trait or vice versa, indicating a more complex genotype-phenotype map than previously anticipated.Peer reviewe

    Tobacco smoking is associated with DNA methylation of diabetes susceptibility genes.

    Get PDF
    AIMS/HYPOTHESIS: Tobacco smoking, a risk factor for diabetes, is an established modifier of DNA methylation. We hypothesised that tobacco smoking modifies DNA methylation of genes previously identified for diabetes. METHODS: We annotated CpG sites available on the Illumina Human Methylation 450K array to diabetes genes previously identified by genome-wide association studies (GWAS), and investigated them for an association with smoking by comparing current to never smokers. The discovery study consisted of 630 individuals (Bonferroni-corrected p = 1.4 × 10(-5)), and we sought replication in an independent sample of 674 individuals. The replicated sites were tested for association with nearby genetic variants and gene expression and fasting glucose and insulin levels. RESULTS: We annotated 3,620 CpG sites to the genes identified in the GWAS on type 2 diabetes. Comparing current smokers to never smokers, we found 12 differentially methylated CpG sites, of which five replicated: cg23161492 within ANPEP (p = 1.3 × 10(-12)); cg26963277 (p = 1.2 × 10(-9)), cg01744331 (p = 8.0 × 10(-6)) and cg16556677 (p = 1.2 × 10(-5)) within KCNQ1 and cg03450842 (p = 3.1 × 10(-8)) within ZMIZ1. The effect of smoking on DNA methylation at the replicated CpG sites attenuated after smoking cessation. Increased DNA methylation at cg23161492 was associated with decreased gene expression levels of ANPEP (p = 8.9 × 10(-5)). rs231356-T, which was associated with hypomethylation of cg26963277 (KCNQ1), was associated with a higher odds of diabetes (OR 1.06, p = 1.3 × 10(-5)). Additionally, hypomethylation of cg26963277 was associated with lower fasting insulin levels (p = 0.04). CONCLUSIONS/INTERPRETATION: Tobacco smoking is associated with differential DNA methylation of the diabetes risk genes ANPEP, KCNQ1 and ZMIZ1. Our study highlights potential biological mechanisms connecting tobacco smoking to excess risk of type 2 diabetes

    Refined mapping of autoimmune disease associated genetic variants with gene expression suggests an important role for non-coding RNAs

    Get PDF
    Genome-wide association and fine-mapping studies in 14 autoimmune diseases (AID) have implicated more than 250 loci in one or more of these diseases. As more than 90% of AID-associated SNPs are intergenic or intronic, pinpointing the causal genes is challenging. We performed a systematic analysis to link 460 SNPs that are associated with 14 AID to causal genes using transcriptomic data from 629 blood samples. We were able to link 71 (39%) of the AID-SNPs to two or more nearby genes, providing evidence that for part of the AID loci multiple causal genes exist. While 54 of the AID loci are shared by one or more AID, 17% of them do not share candidate causal genes. In addition to finding novel genes such as ULK3, we also implicate novel disease mechanisms and pathways like autophagy in celiac disease pathogenesis. Furthermore, 42 of the AID SNPs specifically affected the expression of 53 non-coding RNA genes. To further understand how the non-coding genome contributes to AID, the SNPs were linked to functional regulatory elements, which suggest a model where AID genes are regulated by network of chromatin looping/non-coding RNAs interactions. The looping model also explains how a causal candidate gene is not necessarily the gene closest to the AID SNP, which was the case in nearly 50% of cases

    Correction for both common and rare cell types in blood is important to identify genes that correlate with age

    Get PDF
    Background Aging is a multifactorial process that affects multiple tissues and is characterized by changes in homeostasis over time, leading to increased morbidity. Whole blood gene expression signatures have been associated with aging and have been used to gain information on its biological mechanisms, which are still not fully understood. However, blood is composed of many cell types whose proportions in blood vary with age. As a result, previously observed associations between gene expression levels and aging might be driven by cell type composition rather than intracellular aging mechanisms. To overcome this, previous aging studies already accounted for major cell types, but the possibility that the reported associations are false positives driven by less prevalent cell subtypes remains. Results Here, we compared the regression model from our previous work to an extended model that corrects for 33 additional white blood cell subtypes. Both models were applied to whole blood gene expression data from 3165 individuals belonging to the general population (age range of 18-81 years). We evaluated that the new model is a better fit for the data and it identified fewer genes associated with aging (625, compared to the 2808 of the initial model; P <= 2.5x10(-6)). Moreover, 511 genes (similar to 18% of the 2808 genes identified by the initial model) were found using both models, indicating that the other previously reported genes could be proxies for less abundant cell types. In particular, functional enrichment of the genes identified by the new model highlighted pathways and GO terms specifically associated with platelet activity. Conclusions We conclude that gene expression analyses in blood strongly benefit from correction for both common and rare blood cell types, and recommend using blood-cell count estimates as standard covariates when studying whole blood gene expression.Molecular Epidemiolog

    Novel Blood Pressure Locus and Gene Discovery Using Genome-Wide Association Study and Expression Data Sets From Blood and the Kidney.

    Get PDF
    Elevated blood pressure is a major risk factor for cardiovascular disease and has a substantial genetic contribution. Genetic variation influencing blood pressure has the potential to identify new pharmacological targets for the treatment of hypertension. To discover additional novel blood pressure loci, we used 1000 Genomes Project-based imputation in 150 134 European ancestry individuals and sought significant evidence for independent replication in a further 228 245 individuals. We report 6 new signals of association in or near HSPB7, TNXB, LRP12, LOC283335, SEPT9, and AKT2, and provide new replication evidence for a further 2 signals in EBF2 and NFKBIA Combining large whole-blood gene expression resources totaling 12 607 individuals, we investigated all novel and previously reported signals and identified 48 genes with evidence for involvement in blood pressure regulation that are significant in multiple resources. Three novel kidney-specific signals were also detected. These robustly implicated genes may provide new leads for therapeutic innovation

    Epigenome-Wide Association Study of Tic Disorders

    Get PDF
    Tic disorders are moderately heritable common psychiatric disorders that can be highly troubling, both in childhood and in adulthood. In this study, we report results obtained in the first epigenome-wide association study (EWAS) of tic disorders. The subjects are participants in surveys at the Netherlands Twin Register (NTR) and the NTR biobank project. Tic disorders were measured with a self-report version of the Yale Global Tic Severity Scale Abbreviated version (YGTSS-ABBR), included in the 8th wave NTR data collection (2008). DNA methylation data consisted of 411,169 autosomal methylation sites assessed by the Illumina Infinium HumanMethylation450 BeadChip Kit (HM450k array). Phenotype and DNA methylation data were available in 1,678 subjects (mean age = 41.5). No probes reached genome-wide significance (p < 1.2 × 10-7). The strongest associated probe was cg15583738, located in an intergenic region on chromosome 8 (p = 1.98 × 10-6). Several of the top ranking probes (p < 1 × 10-4) were in or nearby genes previously associated with neurological disorders (e.g., GABBRI, BLM, and ADAM10), warranting their further investigation in relation to tic disorders. The top significantly enriched gene ontology (GO) terms among higher ranking methylation sites included anatomical structure morphogenesis (GO:0009653, p = 4.6 × 10-15) developmental process (GO:0032502, p = 2.96 × 10-12), and cellular developmental process (GO:0048869, p = 1.96 × 10-12). Overall, these results provide a first insight into the epigenetic mechanisms of tic disorders. This first study assesses the role of DNA methylation in tic disorders, and it lays the foundations for future work aiming to unravel the biological mechanisms underlying the architecture of this disorder. Copyright © The Author(s) 2015
    corecore