1,682 research outputs found

    Genome-wide associations of gene expression variation in humans

    Get PDF
    The exploration of quantitative variation in human populations has become one of the major priorities for medical genetics. The successful identification of variants that contribute to complex traits is highly dependent on reliable assays and genetic maps. We have performed a genome-wide quantitative trait analysis of 630 genes in 60 unrelated Utah residents with ancestry from Northern and Western Europe using the publicly available phase I data of the International HapMap project. The genes are located in regions of the human genome with elevated functional annotation and disease interest including the ENCODE regions spanning 1% of the genome, Chromosome 21 and Chromosome 20q12-13.2. We apply three different methods of multiple test correction, including Bonferroni, false discovery rate, and permutations. For the 374 expressed genes, we find many regions with statistically significant association of single nucleotide polymorphisms (SNPs) with expression variation in lymphoblastoid cell lines after correcting for multiple tests. Based on our analyses, the signal proximal (cis-) to the genes of interest is more abundant and more stable than distal and trans across statistical methodologies. Our results suggest that regulatory polymorphism is widespread in the human genome and show that the 5-kb (phase I) HapMap has sufficient density to enable linkage disequilibrium mapping in humans. Such studies will significantly enhance our ability to annotate the non-coding part of the genome and interpret functional variation. In addition, we demonstrate that the HapMap cell lines themselves may serve as a useful resource for quantitative measurements at the cellular level

    Latent class analysis variable selection

    Get PDF
    We propose a method for selecting variables in latent class analysis, which is the most common model-based clustering method for discrete data. The method assesses a variable's usefulness for clustering by comparing two models, given the clustering variables already selected. In one model the variable contributes information about cluster allocation beyond that contained in the already selected variables, and in the other model it does not. A headlong search algorithm is used to explore the model space and select clustering variables. In simulated datasets we found that the method selected the correct clustering variables, and also led to improvements in classification performance and in accuracy of the choice of the number of classes. In two real datasets, our method discovered the same group structure with fewer variables. In a dataset from the International HapMap Project consisting of 639 single nucleotide polymorphisms (SNPs) from 210 members of different groups, our method discovered the same group structure with a much smaller number of SNP

    GRIMP: A web- and grid-based tool for high-speed analysis of large-scale genome-wide association using imputed data.

    Get PDF
    The current fast growth of genome-wide association studies (GWAS) combined with now common computationally expensive imputation requires the online access of large user groups to high-performance computing resources capable of analyzing rapidly and efficiently millions of genetic markers for ten thousands of individuals. Here, we present a web-based interface—called GRIMP—to run publicly available genetic software for extremely large GWAS on scalable super-computing grid infrastructures. This is of major importance for the enlargement of GWAS with the availability of whole-genome sequence data from the 1000 Genomes Project and for future whole-population efforts

    The promoter polymorphism -232C/G of the PCK1 gene is associated with type 2 diabetes in a UK-resident South Asian population

    Get PDF
    Background: The PCK1 gene, encoding cytosolic phosphoenolpyruvate carboxykinase (PEPCK-C), has previously been implicated as a candidate gene for type 2 diabetes (T2D) susceptibility. Rodent models demonstrate that over-expression of Pck1 can result in T2D development and a single nucleotide polymorphism (SNP) in the promoter region of human PCK1 (-232C/G) has exhibited significant association with the disease in several cohorts. Within the UK-resident South Asian population, T2D is 4 to 6 times more common than in indigenous white Caucasians. Despite this, few studies have reported on the genetic susceptibility to T2D in this ethnic group and none of these has investigated the possible effect of PCK1 variants. We therefore aimed to investigate the association between common variants of the PCK1 gene and T2D in a UK-resident South Asian population of Punjabi ancestry, originating predominantly from the Mirpur area of Azad Kashmir, Pakistan. \ud \ud Methods: We used TaqMan assays to genotype five tagSNPs covering the PCK1 gene, including the -232C/G variant, in 903 subjects with T2D and 471 normoglycaemic controls. \ud \ud Results: Of the variants studied, only the minor allele (G) of the -232C/G SNP demonstrated a significant association with T2D, displaying an OR of 1.21 (95% CI: 1.03 - 1.42, p = 0.019). \ud \ud Conclusion: This study is the first to investigate the association between variants of the PCK1 gene and T2D in South Asians. Our results suggest that the -232C/G promoter polymorphism confers susceptibility to T2D in this ethnic group. \ud \ud Trial registration: UKADS Trial Registration: ISRCTN38297969

    Common variants of the TCF7L2 gene are associated with increased risk of type 2 diabetes mellitus in a UK-resident South Asian population

    Get PDF
    Background Recent studies have implicated variants of the transcription factor 7-like 2 (TCF7L2) gene in genetic susceptibility to type 2 diabetes mellitus in several different populations. The aim of this study was to determine whether variants of this gene are also risk factors for type 2 diabetes development in a UK-resident South Asian cohort of Punjabi ancestry. Methods We genotyped four single nucleotide polymorphisms (SNPs) of TCF7L2 (rs7901695, rs7903146, rs11196205 and rs12255372) in 831 subjects with diabetes and 437 control subjects. Results The minor allele of each variant was significantly associated with type 2 diabetes; the greatest risk of developing the disease was conferred by rs7903146, with an allelic odds ratio (OR) of 1.31 (95% CI: 1.11 – 1.56, p = 1.96 × 10-3). For each variant, disease risk associated with homozygosity for the minor allele was greater than that for heterozygotes, with the exception of rs12255372. To determine the effect on the observed associations of including young control subjects in our data set, we reanalysed the data using subsets of the control group defined by different minimum age thresholds. Increasing the minimum age of our control subjects resulted in a corresponding increase in OR for all variants of the gene (p ≤ 1.04 × 10-7). Conclusion Our results support recent findings that TCF7L2 is an important genetic risk factor for the development of type 2 diabetes in multiple ethnic groups

    Next generation analytic tools for large scale genetic epidemiology studies of complex diseases

    Full text link
    Over the past several years, genome‐wide association studies (GWAS) have succeeded in identifying hundreds of genetic markers associated with common diseases. However, most of these markers confer relatively small increments of risk and explain only a small proportion of familial clustering. To identify obstacles to future progress in genetic epidemiology research and provide recommendations to NIH for overcoming these barriers, the National Cancer Institute sponsored a workshop entitled “Next Generation Analytic Tools for Large‐Scale Genetic Epidemiology Studies of Complex Diseases” on September 15–16, 2010. The goal of the workshop was to facilitate discussions on (1) statistical strategies and methods to efficiently identify genetic and environmental factors contributing to the risk of complex disease; and (2) how to develop, apply, and evaluate these strategies for the design, analysis, and interpretation of large‐scale complex disease association studies in order to guide NIH in setting the future agenda in this area of research. The workshop was organized as a series of short presentations covering scientific (gene‐gene and gene‐environment interaction, complex phenotypes, and rare variants and next generation sequencing) and methodological (simulation modeling and computational resources and data management) topic areas. Specific needs to advance the field were identified during each session and are summarized. Genet. Epidemiol . 36 : 22–35, 2012. © 2011 Wiley Periodicals, Inc.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/93578/1/gepi20652.pd

    A second generation human haplotype map of over 3.1 million SNPs

    Get PDF
    We describe the Phase II HapMap, which characterizes over 3.1 million human single nucleotide polymorphisms (SNPs) genotyped in 270 individuals from four geographically diverse populations and includes 25–35% of common SNP variation in the populations surveyed. The map is estimated to capture untyped common variation with an average maximum r2 of between 0.9 and 0.96 depending on population. We demonstrate that the current generation of commercial genome-wide genotyping products captures common Phase II SNPs with an average maximum r2 of up to 0.8 in African and up to 0.95 in non-African populations, and that potential gains in power in association studies can be obtained through imputation. These data also reveal novel aspects of the structure of linkage disequilibrium. We show that 10–30% of pairs of individuals within a population share at least one region of extended genetic identity arising from recent ancestry and that up to 1% of all common variants are untaggable, primarily because they lie within recombination hotspots. We show that recombination rates vary systematically around genes and between genes of different function. Finally, we demonstrate increased differentiation at non-synonymous, compared to synonymous, SNPs, resulting from systematic differences in the strength or efficacy of natural selection between populations

    Imputation-Based Analysis of Association Studies: Candidate Regions and Quantitative Traits

    Get PDF
    We introduce a new framework for the analysis of association studies, designed to allow untyped variants to be more effectively and directly tested for association with a phenotype. The idea is to combine knowledge on patterns of correlation among SNPs (e.g., from the International HapMap project or resequencing data in a candidate region of interest) with genotype data at tag SNPs collected on a phenotyped study sample, to estimate (“impute”) unmeasured genotypes, and then assess association between the phenotype and these estimated genotypes. Compared with standard single-SNP tests, this approach results in increased power to detect association, even in cases in which the causal variant is typed, with the greatest gain occurring when multiple causal variants are present. It also provides more interpretable explanations for observed associations, including assessing, for each SNP, the strength of the evidence that it (rather than another correlated SNP) is causal. Although we focus on association studies with quantitative phenotype and a relatively restricted region (e.g., a candidate gene), the framework is applicable and computationally practical for whole genome association studies. Methods described here are implemented in a software package, Bim-Bam, available from the Stephens Lab website http://stephenslab.uchicago.edu/software.html

    Comparing Patterns of Natural Selection across Species Using Selective Signatures

    Get PDF
    Comparing gene expression profiles over many different conditions has led to insights that were not obvious from single experiments. In the same way, comparing patterns of natural selection across a set of ecologically distinct species may extend what can be learned from individual genome-wide surveys. Toward this end, we show how variation in protein evolutionary rates, after correcting for genome-wide effects such as mutation rate and demographic factors, can be used to estimate the level and types of natural selection acting on genes across different species. We identify unusually rapidly and slowly evolving genes, relative to empirically derived genome-wide and gene family-specific background rates for 744 core protein families in 30 γ-proteobacterial species. We describe the pattern of fast or slow evolution across species as the “selective signature” of a gene. Selective signatures represent a profile of selection across species that is predictive of gene function: pairs of genes with correlated selective signatures are more likely to share the same cellular function, and genes in the same pathway can evolve in concert. For example, glycolysis and phenylalanine metabolism genes evolve rapidly in Idiomarina loihiensis, mirroring an ecological shift in carbon source from sugars to amino acids. In a broader context, our results suggest that the genomic landscape is organized into functional modules even at the level of natural selection, and thus it may be easier than expected to understand the complex evolutionary pressures on a cell

    Data analysis issues for allele-specific expression using Illumina's GoldenGate assay.

    Get PDF
    BACKGROUND: High-throughput measurement of allele-specific expression (ASE) is a relatively new and exciting application area for array-based technologies. In this paper, we explore several data sets which make use of Illumina's GoldenGate BeadArray technology to measure ASE. This platform exploits coding SNPs to obtain relative expression measurements for alleles at approximately 1500 positions in the genome. RESULTS: We analyze data from a mixture experiment where genomic DNA samples from pairs of individuals of known genotypes are pooled to create allelic imbalances at varying levels for the majority of SNPs on the array. We observe that GoldenGate has less sensitivity at detecting subtle allelic imbalances (around 1.3 fold) compared to extreme imbalances, and note the benefit of applying local background correction to the data. Analysis of data from a dye-swap control experiment allowed us to quantify dye-bias, which can be reduced considerably by careful normalization. The need to filter the data before carrying out further downstream analysis to remove non-responding probes, which show either weak, or non-specific signal for each allele, was also demonstrated. Throughout this paper, we find that a linear model analysis of the data from each SNP is a flexible modelling strategy that allows for testing of allelic imbalances in each sample when replicate hybridizations are available. CONCLUSIONS: Our analysis shows that local background correction carried out by Illumina's software, together with quantile normalization of the red and green channels within each array, provides optimal performance in terms of false positive rates. In addition, we strongly encourage intensity-based filtering to remove SNPs which only measure non-specific signal. We anticipate that a similar analysis strategy will prove useful when quantifying ASE on Illumina's higher density Infinium BeadChips.RIGHTS : This article is licensed under the BioMed Central licence at http://www.biomedcentral.com/about/license which is similar to the 'Creative Commons Attribution Licence'. In brief you may : copy, distribute, and display the work; make derivative works; or make commercial use of the work - under the following conditions: the original author must be given credit; for any reuse or distribution, it must be made clear to others what the license terms of this work are
    corecore