374 research outputs found

    Bayesian model search and multilevel inference for SNP association studies

    Full text link
    Technological advances in genotyping have given rise to hypothesis-based association studies of increasing scope. As a result, the scientific hypotheses addressed by these studies have become more complex and more difficult to address using existing analytic methodologies. Obstacles to analysis include inference in the face of multiple comparisons, complications arising from correlations among the SNPs (single nucleotide polymorphisms), choice of their genetic parametrization and missing data. In this paper we present an efficient Bayesian model search strategy that searches over the space of genetic markers and their genetic parametrization. The resulting method for Multilevel Inference of SNP Associations, MISA, allows computation of multilevel posterior probabilities and Bayes factors at the global, gene and SNP level, with the prior distribution on SNP inclusion in the model providing an intrinsic multiplicity correction. We use simulated data sets to characterize MISA's statistical power, and show that MISA has higher power to detect association than standard procedures. Using data from the North Carolina Ovarian Cancer Study (NCOCS), MISA identifies variants that were not identified by standard methods and have been externally ``validated'' in independent studies. We examine sensitivity of the NCOCS results to prior choice and method for imputing missing data. MISA is available in an R package on CRAN.Comment: Published in at http://dx.doi.org/10.1214/09-AOAS322 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    MULTIPLE MODEL EVALUATION ABSENT THE GOLD STANDARD VIA MODEL COMBINATION

    Get PDF
    We describe a method for evaluating an ensemble of predictive models given a sample of observations comprising the model predictions and the outcome event measured with error. Our formulation allows us to simultaneously estimate measurement error parameters, true outcome — aka the gold standard — and a relative weighting of the predictive scores. We describe conditions necessary to estimate the gold standard and for these estimates to be calibrated and detail how our approach is related to, but distinct from, standard model combination techniques. We apply our approach to data from a study to evaluate a collection of BRCA1/BRCA2 gene mutation prediction scores. In this example, genotype is measured with error by one or more genetic assays. We estimate true genotype for each individual in the dataset, operating characteristics of the commonly used genotyping procedures and a relative weighting of the scores. Finally, we compare the scores against the gold standard genotype and find that Mendelian scores are, on average, the more refined and better calibrated of those considered and that the comparison is sensitive to measurement error in the gold standard

    Hosts of avian brood parasites have evolved egg signatures with elevated information content.

    Get PDF
    Hosts of brood-parasitic birds must distinguish their own eggs from parasitic mimics, or pay the cost of mistakenly raising a foreign chick. Egg discrimination is easier when different host females of the same species each lay visually distinctive eggs (egg 'signatures'), which helps to foil mimicry by parasites. Here, we ask whether brood parasitism is associated with lower levels of correlation between different egg traits in hosts, making individual host signatures more distinctive and informative. We used entropy as an index of the potential information content encoded by nine aspects of colour, pattern and luminance of eggs of different species in two African bird families (Cisticolidae parasitized by cuckoo finches Anomalospiza imberbis, and Ploceidae by diederik cuckoos Chrysococcyx caprius). Parasitized species showed consistently higher entropy in egg traits than did related, unparasitized species. Decomposing entropy into two variation components revealed that this was mainly driven by parasitized species having lower levels of correlation between different egg traits, rather than higher overall levels of variation in each individual egg trait. This suggests that irrespective of the constraints that might operate on individual egg traits, hosts can further improve their defensive 'signatures' by arranging suites of egg traits into unpredictable combinations.EMC was supported by the Pomona College-Downing College Student Exchange Scholarship, MS by a BBSRC David Phillips Research Fellowship (BB/G022887/1), and CNS by a Royal Society Dorothy Hodgkin Fellowship, a BBSRC David Phillips Research Fellowship (BB/J014109/1), and the DST-NRF Centre of Excellence at the Percy FitzPatrick Institute.This is the final version of the article. It first appeared from Royal Society Publishing via http://dx.doi.org/10.1098/rspb.2015.059

    OPTIMIZED CROSS-STUDY ANALYSIS OF MICROARRAY-BASED PREDICTORS

    Get PDF
    Background: Microarray-based gene expression analysis is widely used in cancer research to discover molecular signatures for cancer classification and prediction. In addition to numerous independent profiling projects, a number of investigators have analyzed multiple published data sets for purposes of cross-study validation. However, the diverse microarray platforms and technical approaches make direct comparisons across studies difficult, and without means to identify aberrant data patterns, less than optimal. To address this issue, we previously developed an integrative correlation approach to systematically address agreement of gene expression measurements across studies, providing a basis for cross-study validation analysis. Here we generalize this methodology to provide a metric for evaluating the overall efficacy of preprocessing and cross-referencing, and explore optimal combinations of filtering and cross-referencing strategies. We operate in the context of validating prognostic breast cancer gene expression signatures on data reported by three different groups, each using a different platform. Results: To evaluate overall cross-platform reproducibility in the context of a specific prediction problem, we suggest integrative association, that is the cross-study correlation of gene-specific measure of association with the phenotype predicted. Specifically, in this paper we use the correlation among the Cox proportional hazard coefficients for association of gene expression to relapse free survival (RFS). Gene filtering by integrative correlation to select reproducible genes emerged as the key factor to increase the integrative association, while alternative methods of gene cross-referencing and gene filtering proved only to modestly improve the overall reproducibility. Patient selection was another major factor affecting the validation process. In particular, in one of the studies considered, gene expression association with RFS varied across subsets of patients that differ by their ascertainment criteria. One of the subsets proved to be highly consistent with other studies, while others showed significantly lower consistency. Third, as expected, use of cluster-specific mean expression profiles in the Cox model yielded more generalizable results than expression data from individual genes. Finally, by using our approach we were able to validate the association between the breast cancer molecular classes proposed by Sorlie et al. and RFS. Conclusions: This paper provides a simple, practical and comprehensive technique for measuring consistency of molecular classification results across microarray platforms, without requiring subjective judgments about membership of samples in putative clusters. This methodology will be of value in consistently typing breast and other cancers across different studies and platforms in the future. Although the tumor subtypes considered here have been previously validated by their proponents, this is the first independent validation, and the first to include the Affymetrix platform

    BAYESIAN MODEL SEARCH AND MULTILEVEL INFERENCE FOR SNP ASSOCIATION STUDIES.

    Get PDF
    Technological advances in genotyping have given rise to hypothesis-based association studies of increasing scope. As a result, the scientific hypotheses addressed by these studies have become more complex and more difficult to address using existing analytic methodologies. Obstacles to analysis include inference in the face of multiple comparisons, complications arising from correlations among the SNPs (single nucleotide polymorphisms), choice of their genetic parametrization and missing data. In this paper we present an efficient Bayesian model search strategy that searches over the space of genetic markers and their genetic parametrization. The resulting method for Multilevel Inference of SNP Associations, MISA, allows computation of multilevel posterior probabilities and Bayes factors at the global, gene and SNP level, with the prior distribution on SNP inclusion in the model providing an intrinsic multiplicity correction. We use simulated data sets to characterize MISA's statistical power, and show that MISA has higher power to detect association than standard procedures. Using data from the North Carolina Ovarian Cancer Study (NCOCS), MISA identifies variants that were not identified by standard methods and have been externally "validated" in independent studies. We examine sensitivity of the NCOCS results to prior choice and method for imputing missing data. MISA is available in an R package on CRAN

    Effects of Stone Size on the Comminution Process and Efficiency in Shock Wave Lithotripsy

    Get PDF
    The effects of stone size on the process and comminution efficiency in shock wave lithotripsy (SWL) are investigated by experiments, numerical simulations, and scale analysis. Cylindrical BegoStone phantoms with approximately equal height and diameter of either 4-, or 7- or 10-mm, in a total aggregated mass of about 1.5 g, were treated in an electromagnetic shock wave lithotripter field. The resultant stone comminution (SC) was found to correlate closely with the average peak pressure, P+(avg), incident on the stones. The P+(avg) threshold to initiate stone fragmentation in water increased from 7.9 to 8.8 to 12.7 MPa, respectively, when the stone size decreased from 10 to 7 to 4 mm. Similar changes in the P+(avg) threshold were observed for the 7- and 10-mm stones treated in 1,3-butanediol where cavitation is suppressed, suggesting that the observed size dependency is due to changes in stress distribution within different size stones. Moreover, the slope of the correlation curve between SC and ln(P‒+(avg)) in water increased with decreasing stone size, while the opposite trend was observed in 1,3-butanediol. The progression of stone comminution in SWL showed a size-dependency with the 7- and 10-mm stones fragmented into progressively smaller pieces while a significant portion (> 30%) of the 4-mm stones were stalemated within the size range of 2.8 ~ 4 mm even after 1,000 shocks. Analytical scaling considerations suggest size-dependent fragmentation behaviour, a hypothesis further supported by numerical model calculations that exhibit changing patterns of constructive and destructive wave interference, and thus variations in the maximum tensile stress or stress integral produced in cylindrical and spherical stone of different sizes

    Common genetic variation in cellular transport genes and epithelial ovarian cancer (EOC) risk

    Get PDF
    Background Defective cellular transport processes can lead to aberrant accumulation of trace elements, iron, small molecules and hormones in the cell, which in turn may promote the formation of reactive oxygen species, promoting DNA damage and aberrant expression of key regulatory cancer genes. As DNA damage and uncontrolled proliferation are hallmarks of cancer, including epithelial ovarian cancer (EOC), we hypothesized that inherited variation in the cellular transport genes contributes to EOC risk. Methods In total, DNA samples were obtained from 14,525 case subjects with invasive EOC and from 23,447 controls from 43 sites in the Ovarian Cancer Association Consortium (OCAC). Two hundred seventy nine SNPs, representing 131 genes, were genotyped using an Illumina Infinium iSelect BeadChip as part of the Collaborative Oncological Gene-environment Study (COGS). SNP analyses were conducted using unconditional logistic regression under a log-additive model, and the FDR q<0.2 was applied to adjust for multiple comparisons. Results The most significant evidence of an association for all invasive cancers combined and for the serous subtype was observed for SNP rs17216603 in the iron transporter gene HEPH (invasive: OR = 0.85, P = 0.00026; serous: OR = 0.81, P = 0.00020); this SNP was also associated with the borderline/low malignant potential (LMP) tumors (P = 0.021). Other genes significantly associated with EOC histological subtypes (p<0.05) included the UGT1A (endometrioid), SLC25A45 (mucinous), SLC39A11 (low malignant potential), and SERPINA7 (clear cell carcinoma). In addition, 1785 SNPs in six genes (HEPH, MGST1, SERPINA, SLC25A45, SLC39A11 and UGT1A) were imputed from the 1000 Genomes Project and examined for association with INV EOC in white-European subjects. The most significant imputed SNP was rs117729793 in SLC39A11 (per allele, OR = 2.55, 95% CI = 1.5-4.35, p = 5.66x10-4). Conclusion These results, generated on a large cohort of women, revealed associations between inherited cellular transport gene variants and risk of EOC histologic subtypes

    Common variants at theCHEK2gene locus and risk of epithelial ovarian cancer

    Get PDF
    Genome-wide association studies have identified 20 genomic regions associated with risk of epithelial ovarian cancer (EOC), but many additional risk variants may exist. Here, we evaluated associations between common genetic variants [single nucleotide polymorphisms (SNPs) and indels] in DNA repair genes and EOC risk. We genotyped 2896 common variants at 143 gene loci in DNA samples from 15 397 patients with invasive EOC and controls. We found evidence of associations with EOC risk for variants at FANCA, EXO1, E2F4, E2F2, CREB5 and CHEK2 genes (P ≤ 0.001). The strongest risk association was for CHEK2 SNP rs17507066 with serous EOC (P = 4.74 x 10(-7)). Additional genotyping and imputation of genotypes from the 1000 genomes project identified a slightly more significant association for CHEK2 SNP rs6005807 (r (2) with rs17507066 = 0.84, odds ratio (OR) 1.17, 95% CI 1.11-1.24, P = 1.1×10(-7)). We identified 293 variants in the region with likelihood ratios of less than 1:100 for representing the causal variant. Functional annotation identified 25 candidate SNPs that alter transcription factor binding sites within regulatory elements active in EOC precursor tissues. In The Cancer Genome Atlas dataset, CHEK2 gene expression was significantly higher in primary EOCs compared to normal fallopian tube tissues (P = 3.72×10(-8)). We also identified an association between genotypes of the candidate causal SNP rs12166475 (r (2) = 0.99 with rs6005807) and CHEK2 expression (P = 2.70×10(-8)). These data suggest that common variants at 22q12.1 are associated with risk of serous EOC and CHEK2 as a plausible target susceptibility gene.Other Research Uni
    corecore