725 research outputs found

    Methodological Issues in Multistage Genome-Wide Association Studies

    Full text link
    Because of the high cost of commercial genotyping chip technologies, many investigations have used a two-stage design for genome-wide association studies, using part of the sample for an initial discovery of ``promising'' SNPs at a less stringent significance level and the remainder in a joint analysis of just these SNPs using custom genotyping. Typical cost savings of about 50% are possible with this design to obtain comparable levels of overall type I error and power by using about half the sample for stage I and carrying about 0.1% of SNPs forward to the second stage, the optimal design depending primarily upon the ratio of costs per genotype for stages I and II. However, with the rapidly declining costs of the commercial panels, the generally low observed ORs of current studies, and many studies aiming to test multiple hypotheses and multiple endpoints, many investigators are abandoning the two-stage design in favor of simply genotyping all available subjects using a standard high-density panel. Concern is sometimes raised about the absence of a ``replication'' panel in this approach, as required by some high-profile journals, but it must be appreciated that the two-stage design is not a discovery/replication design but simply a more efficient design for discovery using a joint analysis of the data from both stages. Once a subset of highly-significant associations has been discovered, a truly independent ``exact replication'' study is needed in a similar population of the same promising SNPs using similar methods.Comment: Published in at http://dx.doi.org/10.1214/09-STS288 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Interethnic differences in pancreatic cancer incidence and risk factors: The Multiethnic Cohort.

    Get PDF
    While disparity in pancreatic cancer incidence between blacks and whites has been observed, few studies have examined disparity in other ethnic minorities. We evaluated variations in pancreatic cancer incidence and assessed the extent to which known risk factors account for differences in pancreatic cancer risk among African Americans, Native Hawaiians, Japanese Americans, Latino Americans, and European Americans in the Multiethnic Cohort Study. Risk factor data were obtained from the baseline questionnaire. Cox regression was used to estimate the relative risks (RRs) and 95% confidence intervals (CIs) for pancreatic cancer associated with risk factors and ethnicity. During an average 16.9-year follow-up, 1,532 incident pancreatic cancer cases were identified among 184,559 at-risk participants. Family history of pancreatic cancer (RR 1.97, 95% CI 1.50-2.58), diabetes (RR 1.32, 95% CI 1.14-1.54), body mass index ≥30 kg/m2 (RR 1.25, 95% CI 1.08-1.46), current smoking (<20 pack-years RR 1.43, 95% CI 1.19-1.73; ≥20 pack-years RR 1.76, 95% CI 1.46-2.12), and red meat intake (RR 1.17, 95% CI 1.00-1.36) were associated with pancreatic cancer. After adjustment for these risk factors, Native Hawaiians (RR 1.60, 95% CI 1.30-1.98), Japanese Americans (RR 1.33, 95% CI 1.15-1.54), and African Americans (RR 1.20, 95% CI 1.01-1.42), but not Latino Americans (RR 0.90, 95% CI 0.76-1.07), had a higher risk of pancreatic cancer compared to European Americans. Interethnic differences in pancreatic cancer risk are not fully explained by differences in the distribution of known risk factors. The greater risks in Native Hawaiians and Japanese Americans are new findings and elucidating the causes of these high rates may improve our understanding and prevention of pancreatic cancer

    A Kinship-Based Modification of the Armitage Trend Test to Address Hidden Population Structure and Small Differential Genotyping Errors

    Get PDF
    BACKGROUND/AIMS: We propose a modification of the well-known Armitage trend test to address the problems associated with hidden population structure and hidden relatedness in genome-wide case-control association studies. METHODS: The new test adopts beneficial traits from three existing testing strategies: the principal components, mixed model, and genomic control while avoiding some of their disadvantageous characteristics, such as the tendency of the principal components method to over-correct in certain situations or the failure of the genomic control approach to reorder the adjusted tests based on their degree of alignment with the underlying hidden structure. The new procedure is based on Gauss-Markov estimators derived from a straightforward linear model with an imposed variance structure proportional to an empirical relatedness matrix. Lastly, conceptual and analytical similarities to and distinctions from other approaches are emphasized throughout. RESULTS: Our simulations show that the power performance of the proposed test is quite promising compared to the considered competing strategies. The power gains are especially large when small differential differences between cases and controls are present; a likely scenario when public controls are used in multiple studies. CONCLUSION: The proposed modified approach attains high power more consistently than that of the existing commonly implemented tests. Its performance improvement is most apparent when small but detectable systematic differences between cases and controls exist

    Improved Imputation of Common and Uncommon Single Nucleotide Polymorphisms (SNPs) with a New Reference Set

    Get PDF
    Statistical imputation of genotype data is an important technique for analysis of genome-wide association studies (GWAS). We have built a reference dataset to improve imputation accuracy for studies of individuals of primarily European descent using genotype data from the Hap1, Omni1, and Omni2.5 human SNP arrays (Illumina). Our dataset contains 2.5-3.1 million variants for 930 European, 157 Asian, and 162 African/African-American individuals. Imputation accuracy of European data from Hap660 or OmniExpress array content, measured by the proportion of variants imputed with R^2^>0.8, improved by 34%, 23% and 12% for variants with MAF of 3%, 5% and 10%, respectively, compared to imputation using publicly available data from 1,000 Genomes and International HapMap projects. The improved accuracy with the use of the new dataset could increase the power for GWAS by as much as 8% relative to genotyping all variants. This reference dataset is available to the scientific community through the NCBI dbGaP portal. Future versions will include additional genotype data as well as non-European populations
    corecore