27 research outputs found
A novel approach to simulate gene-environment interactions in complex diseases
Background: Complex diseases are multifactorial traits caused by both genetic and environmental factors. They represent the major part of human diseases and include those with largest prevalence and mortality (cancer, heart disease, obesity, etc.). Despite a large amount of information that has been collected about both genetic and environmental risk factors, there are few examples of studies on their interactions in epidemiological literature. One reason can be the incomplete knowledge of the power of statistical methods designed to search for risk factors and their interactions in these data sets. An improvement in this direction would lead to a better understanding and description of gene-environment interactions. To this aim, a possible strategy is to challenge the different statistical methods against data sets where the underlying phenomenon is completely known and fully controllable, for example simulated ones.
Results: We present a mathematical approach that models gene-environment interactions. By this method it is possible to generate simulated populations having gene-environment interactions of any form, involving any number of genetic and environmental factors and also allowing non-linear interactions as epistasis. In particular, we implemented a simple version of this model in a Gene-Environment iNteraction Simulator (GENS), a tool designed to simulate case-control data sets where a one gene-one environment interaction influences the disease risk. The main aim has been to allow the input of population characteristics by using standard epidemiological measures and to implement constraints to make the simulator behaviour biologically meaningful.
Conclusions: By the multi-logistic model implemented in GENS it is possible to simulate case-control samples of complex disease where gene-environment interactions influence the disease risk. The user has full control of the main characteristics of the simulated population and a Monte Carlo process allows random variability. A knowledge-based approach reduces the complexity of the mathematical model by using reasonable biological constraints and makes the simulation more understandable in biological terms. Simulated data sets can be used for the assessment of novel statistical methods or for the evaluation of the statistical power when designing a study
A Covering Method for Detecting Genetic Associations between Rare Variants and Common Phenotypes
Genome wide association (GWA) studies, which test for association between common genetic markers and a disease phenotype, have shown varying degrees of success. While many factors could potentially confound GWA studies, we focus on the possibility that multiple, rare variants (RVs) may act in concert to influence disease etiology. Here, we describe an algorithm for RV analysis, RARECOVER. The algorithm combines a disparate collection of RVs with low effect and modest penetrance. Further, it does not require the rare variants be adjacent in location. Extensive simulations over a range of assumed penetrance and population attributable risk (PAR) values illustrate the power of our approach over other published methods, including the collapsing and weighted-collapsing strategies. To showcase the method, we apply RARECOVER to re-sequencing data from a cohort of 289 individuals at the extremes of Body Mass Index distribution (NCT00263042). Individual samples were re-sequenced at two genes, FAAH and MGLL, known to be involved in endocannabinoid metabolism (187Kbp for 148 obese and 150 controls). The RARECOVER analysis identifies exactly one significantly associated region in each gene, each about 5 Kbp in the upstream regulatory regions. The data suggests that the RVs help disrupt the expression of the two genes, leading to lowered metabolism of the corresponding cannabinoids. Overall, our results point to the power of including RVs in measuring genetic associations.National Science Foundation (U.S.) (grant (IIS-0810905)National Institutes of Health (U.S.) (U19 AG023122-05)National Institutes of Health (U.S.) (R01 MH078151-03)Louis & Harold Price FoundationNational Institutes of Health (U.S.) (N01 MH22005)National Institutes of Health (U.S.) (U01-DA024417-01)National Institutes of Health (U.S.) (P50 MH081755-01)National Institutes of Health (U.S.) (R01 AG030474-02)National Institutes of Health (U.S.) (N01 MH022005)National Institutes of Health (U.S.) (R01 HL089655-02)National Institutes of Health (U.S.) (R01 MH080134-03)National Institutes of Health (U.S.) (U54 CA143906-01)National Institutes of Health (U.S.) (UL1 RR025774-03)Scripps Genomic Medicine ProgramNational Human Genome Research Institute (U.S.) (Grant Number T32 HG002295
Validation of pooled genotyping on the Affymetrix 500 k and SNP6.0 genotyping platforms using the polynomial-based probe-specific correction
10.1186/1471-2156-10-82BMC Genetics10-BGME
Identification of a Shared Genetic Susceptibility Locus for Coronary Heart Disease and Periodontitis
Recent studies indicate a mutual epidemiological relationship between coronary heart disease (CHD) and periodontitis. Both diseases are associated with similar risk factors and are characterized by a chronic inflammatory process. In a candidate-gene association study, we identify an association of a genetic susceptibility locus shared by both diseases. We confirm the known association of two neighboring linkage disequilibrium regions on human chromosome 9p21.3 with CHD and show the additional strong association of these loci with the risk of aggressive periodontitis. For the lead SNP of the main associated linkage disequilibrium region, rs1333048, the odds ratio of the autosomal-recessive mode of inheritance is 1.99 (95% confidence interval 1.33–2.94; P = 6.9×10−4) for generalized aggressive periodontitis, and 1.72 (1.06–2.76; P = 2.6×10−2) for localized aggressive periodontitis. The two associated linkage disequilibrium regions map to the sequence of the large antisense noncoding RNA ANRIL, which partly overlaps regulatory and coding sequences of CDKN2A/CDKN2B. A closely located diabetes-associated variant was independent of the CHD and periodontitis risk haplotypes. Our study demonstrates that CHD and periodontitis are genetically related by at least one susceptibility locus, which is possibly involved in ANRIL activity and independent of diabetes associated risk variants within this region. Elucidation of the interplay of ANRIL transcript variants and their involvement in increased susceptibility to the interactive diseases CHD and periodontitis promises new insight into the underlying shared pathogenic mechanisms of these complex common diseases
Contribution of SLC30A8 variants to the risk of type 2 diabetes in a multi-ethnic population: a case control study
A yeast phenomic model for the gene interaction network modulating CFTR-ΔF508 protein biogenesis
A Bayesian Method for Detecting and Characterizing Allelic Heterogeneity and Boosting Signals in Genome-Wide Association Studies
The standard paradigm for the analysis of genome-wide association studies involves carrying out association tests at both typed and imputed SNPs. These methods will not be optimal for detecting the signal of association at SNPs that are not currently known or in regions where allelic heterogeneity occurs. We propose a novel association test, complementary to the SNP-based approaches, that attempts to extract further signals of association by explicitly modeling and estimating both unknown SNPs and allelic heterogeneity at a locus. At each site we estimate the genealogy of the case-control sample by taking advantage of the HapMap haplotypes across the genome. Allelic heterogeneity is modeled by allowing more than one mutation on the branches of the genealogy. Our use of Bayesian methods allows us to assess directly the evidence for a causative SNP not well correlated with known SNPs and for allelic heterogeneity at each locus. Using simulated data and real data from the WTCCC project, we show that our method (i) produces a significant boost in signal and accurately identifies the form of the allelic heterogeneity in regions where it is known to exist, (ii) can suggest new signals that are not found by testing typed or imputed SNPs and (iii) can provide more accurate estimates of effect sizes in regions of association
