Search CORE

UCL Discovery

The Francis Crick Institute

Latent class analysis variable selection

Author: A.E. Raftery
A.L. McCutcheon
Adrian E. Raftery
C. Fraley
C. Keribin
C.C. Clogg
C.C. Clogg
D. Rusakov
G. Galimberti
G.J. McLachlan
J.A. Hagenaars
J.H. Gennari
L. Hubert
L.A. Goodman
Nema Dean
P.F. Lazarsfeld
R. Detrano
R.E. Kass
The International HapMap Consortium
W.M. Rand
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

We propose a method for selecting variables in latent class analysis, which is the most common model-based clustering method for discrete data. The method assesses a variable's usefulness for clustering by comparing two models, given the clustering variables already selected. In one model the variable contributes information about cluster allocation beyond that contained in the already selected variables, and in the other model it does not. A headlong search algorithm is used to explore the model space and select clustering variables. In simulated datasets we found that the method selected the correct clustering variables, and also led to improvements in classification performance and in accuracy of the choice of the number of classes. In two real datasets, our method discovered the same group structure with fewer variables. In a dataset from the International HapMap Project consisting of 639 single nucleotide polymorphisms (SNPs) from 210 members of different groups, our method discovered the same group structure with a much smaller number of SNP

CiteSeerX

Research Papers in Economics

Enlighten

GRIMP: A web- and grid-based tool for high-speed analysis of large-scale genome-wide association using imputed data.

Author: Altshuler
André G. Uitterlinden
Anis Abuseiris
Aulchenko
de Bakker
de Zeeuw
Fernando Rivadeneira
Frank G. Grosveld
Hofman
International HapMap Consortium
Karol Estrada
Krefting
Li
Marchini
Psaty
Tobias A. Knoch
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2009
Field of study

The current fast growth of genome-wide association studies (GWAS) combined with now common computationally expensive imputation requires the online access of large user groups to high-performance computing resources capable of analyzing rapidly and efficiently millions of genetic markers for ten thousands of individuals. Here, we present a web-based interface—called GRIMP—to run publicly available genetic software for extremely large GWAS on scalable super-computing grid infrastructures. This is of major importance for the enlargement of GWAS with the availability of whole-genome sequence data from the 1000 Genomes Project and for future whole-population efforts

Erasmus University Digital Repository

EUR Research Repository

The promoter polymorphism -232C/G of the PCK1 gene is associated with type 2 diabetes in a UK-resident South Asian population

Author: A Valera
Abigail C Britten
AG Gomez-Valades
AH Barnett
Anthony H Barnett
BF Voight
CJ Willer
EG Beale
EH Hani
H Cao
H Zouali
HD Shin
I Gouni-Berthold
J Paul O'Hare
JC Barrett
L Wegner
M Ann Kelly
S Bellary
S Purcell
SD Rees
Simon D Rees
Srikanth Bellary
Sudhesh Kumar
The International HapMap Consortium
Y Dong
Publication venue
Publication date: 01/01/2009
Field of study

Background: The PCK1 gene, encoding cytosolic phosphoenolpyruvate carboxykinase (PEPCK-C), has previously been implicated as a candidate gene for type 2 diabetes (T2D) susceptibility. Rodent models demonstrate that over-expression of Pck1 can result in T2D development and a single nucleotide polymorphism (SNP) in the promoter region of human PCK1 (-232C/G) has exhibited significant association with the disease in several cohorts. Within the UK-resident South Asian population, T2D is 4 to 6 times more common than in indigenous white Caucasians. Despite this, few studies have reported on the genetic susceptibility to T2D in this ethnic group and none of these has investigated the possible effect of PCK1 variants. We therefore aimed to investigate the association between common variants of the PCK1 gene and T2D in a UK-resident South Asian population of Punjabi ancestry, originating predominantly from the Mirpur area of Azad Kashmir, Pakistan. \ud \ud Methods: We used TaqMan assays to genotype five tagSNPs covering the PCK1 gene, including the -232C/G variant, in 903 subjects with T2D and 471 normoglycaemic controls. \ud \ud Results: Of the variants studied, only the minor allele (G) of the -232C/G SNP demonstrated a significant association with T2D, displaying an OR of 1.21 (95% CI: 1.03 - 1.42, p = 0.019). \ud \ud Conclusion: This study is the first to investigate the association between variants of the PCK1 gene and T2D in South Asians. Our results suggest that the -232C/G promoter polymorphism confers susceptibility to T2D in this ethnic group. \ud \ud Trial registration: UKADS Trial Registration: ISRCTN38297969

University of Birmingham Research Portal

Springer - Publisher Connector

Aston Publications Explorer

Warwick Research Archives Portal Repository

Common variants of the TCF7L2 gene are associated with increased risk of type 2 diabetes mellitus in a UK-resident South Asian population

Author: A Helgason
Abigail C Britten
AH Barnett
Anthony H Barnett
C Zhang
CM Damcott
GR Chandak
J Paul O'Hare
J Wang
JC Barrett
JC Florez
JC Florez
JV van Vliet-Ostaptchouk
LJ Scott
M Ann Kelly
R Saxena
R Zhang
RJ Loos
S Cauchi
S Cauchi
S Mayans
SE Humphries
SF Grant
Simon D Rees
Srikanth Bellary
Sudhesh Kumar
The International HapMap Consortium
U Smith
WHO Consultation
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2008
Field of study

Background Recent studies have implicated variants of the transcription factor 7-like 2 (TCF7L2) gene in genetic susceptibility to type 2 diabetes mellitus in several different populations. The aim of this study was to determine whether variants of this gene are also risk factors for type 2 diabetes development in a UK-resident South Asian cohort of Punjabi ancestry. Methods We genotyped four single nucleotide polymorphisms (SNPs) of TCF7L2 (rs7901695, rs7903146, rs11196205 and rs12255372) in 831 subjects with diabetes and 437 control subjects. Results The minor allele of each variant was significantly associated with type 2 diabetes; the greatest risk of developing the disease was conferred by rs7903146, with an allelic odds ratio (OR) of 1.31 (95% CI: 1.11 – 1.56, p = 1.96 × 10-3). For each variant, disease risk associated with homozygosity for the minor allele was greater than that for heterozygotes, with the exception of rs12255372. To determine the effect on the observed associations of including young control subjects in our data set, we reanalysed the data using subsets of the control group defined by different minimum age thresholds. Increasing the minimum age of our control subjects resulted in a corresponding increase in OR for all variants of the gene (p ≤ 1.04 × 10-7). Conclusion Our results support recent findings that TCF7L2 is an important genetic risk factor for the development of type 2 diabetes in multiple ethnic groups

Springer - Publisher Connector

University of Birmingham Research Portal

Aston Publications Explorer

Warwick Research Archives Portal Repository

Next generation analytic tools for large scale genetic epidemiology studies of complex diseases

Author: Albert
Andrieu
Bansal
Blot
Bochud
Breslow
Broeks
Carvajal-Carmona
Chen
Ciampa
Cirulli
Cordell
Cornelis
Cox
De Silva
Drake
Eichler
Gottesman
Green
Greene
Greenland
Han
Hill
Hoffmann
International HapMap Consortium
International HapMap Consortium
International HapMap3 Consortium
Jirtle
Kendler
Klein
Kooperberg
Kooperberg
Kraft
Lander
Langholz
Laurie
Li
Li
Liu
Liu
Madsen
Manolio
Mardis
Milne
Moore
Moore
Moore
Moore
Morgenthaler
Mukherjee
Mukherjee
Murcray
Neale
Peng
Pennisi
Piegorsch
Price
Richter
Ritchie
Ritchie
Rosenstiel
Rothman
Schadt
Schwarz
Siemiatycki
Sinnott-Armstrong
Smith
Stein
Stephens
Thomas
Thomas
Thomas
Thompson
Turner
Vansteelandt
Vineis
Weinberg
Zawistowski
Zeggini
Zhang
Zhang
Zhou
Publication venue: 'Wiley'
Publication date: 01/01/2012
Field of study

Over the past several years, genome‐wide association studies (GWAS) have succeeded in identifying hundreds of genetic markers associated with common diseases. However, most of these markers confer relatively small increments of risk and explain only a small proportion of familial clustering. To identify obstacles to future progress in genetic epidemiology research and provide recommendations to NIH for overcoming these barriers, the National Cancer Institute sponsored a workshop entitled “Next Generation Analytic Tools for Large‐Scale Genetic Epidemiology Studies of Complex Diseases” on September 15–16, 2010. The goal of the workshop was to facilitate discussions on (1) statistical strategies and methods to efficiently identify genetic and environmental factors contributing to the risk of complex disease; and (2) how to develop, apply, and evaluate these strategies for the design, analysis, and interpretation of large‐scale complex disease association studies in order to guide NIH in setting the future agenda in this area of research. The workshop was organized as a series of short presentations covering scientific (gene‐gene and gene‐environment interaction, complex phenotypes, and rare variants and next generation sequencing) and methodological (simulation modeling and computational resources and data management) topic areas. Specific needs to advance the field were identified during each session and are summarized. Genet. Epidemiol . 36 : 22–35, 2012. © 2011 Wiley Periodicals, Inc.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/93578/1/gepi20652.pd

Carolina Digital Repository

Deep Blue Documents

A second generation human haplotype map of over 3.1 million SNPs

Author: Li Yun
The International HapMap Consortium
Publication venue
Publication date: 01/01/2007
Field of study

We describe the Phase II HapMap, which characterizes over 3.1 million human single nucleotide polymorphisms (SNPs) genotyped in 270 individuals from four geographically diverse populations and includes 25–35% of common SNP variation in the populations surveyed. The map is estimated to capture untyped common variation with an average maximum r2 of between 0.9 and 0.96 depending on population. We demonstrate that the current generation of commercial genome-wide genotyping products captures common Phase II SNPs with an average maximum r2 of up to 0.8 in African and up to 0.95 in non-African populations, and that potential gains in power in association studies can be obtained through imputation. These data also reveal novel aspects of the structure of linkage disequilibrium. We show that 10–30% of pairs of individuals within a population share at least one region of extended genetic identity arising from recent ancestry and that up to 1% of all common variants are untaggable, primarily because they lie within recombination hotspots. We show that recombination rates vary systematically around genes and between genes of different function. Finally, we demonstrate increased differentiation at non-synonymous, compared to synonymous, SNPs, resulting from systematic differences in the strength or efficacy of natural selection between populations

Imputation-Based Analysis of Association Studies: Candidate Regions and Quantitative Traits

Author: Bertrand Servin
David B Allison
Matthew Stephens
The International HapMap Consortium
Publication venue: Public Library of Science
Publication date: 01/01/2007
Field of study

We introduce a new framework for the analysis of association studies, designed to allow untyped variants to be more effectively and directly tested for association with a phenotype. The idea is to combine knowledge on patterns of correlation among SNPs (e.g., from the International HapMap project or resequencing data in a candidate region of interest) with genotype data at tag SNPs collected on a phenotyped study sample, to estimate (“impute”) unmeasured genotypes, and then assess association between the phenotype and these estimated genotypes. Compared with standard single-SNP tests, this approach results in increased power to detect association, even in cases in which the causal variant is typed, with the greatest gain occurring when multiple causal variants are present. It also provides more interpretable explanations for observed associations, including assessing, for each SNP, the strength of the evidence that it (rather than another correlated SNP) is causal. Although we focus on association studies with quantitative phenotype and a relatively restricted region (e.g., a candidate gene), the framework is applicable and computationally practical for whole genome association studies. Methods described here are implemented in a software package, Bim-Bam, available from the Stephens Lab website http://stephenslab.uchicago.edu/software.html

Public Library of Science (PLOS)

HAL ENVT (Ecole Nationale Vétérinaire de Toulouse)

ProdInra

HAL: Hyper Article en Ligne

The Francis Crick Institute

Comparing Patterns of Natural Selection across Species Using Selective Signatures

Author: B. Jesse Shapiro
Chimpanzee Sequencing and Analysis Consortium
David S Guttman
Eric J Alm
International HapMap Consortium
Publication venue: Public Library of Science
Publication date: 01/02/2008
Field of study

Comparing gene expression profiles over many different conditions has led to insights that were not obvious from single experiments. In the same way, comparing patterns of natural selection across a set of ecologically distinct species may extend what can be learned from individual genome-wide surveys. Toward this end, we show how variation in protein evolutionary rates, after correcting for genome-wide effects such as mutation rate and demographic factors, can be used to estimate the level and types of natural selection acting on genes across different species. We identify unusually rapidly and slowly evolving genes, relative to empirically derived genome-wide and gene family-specific background rates for 744 core protein families in 30 γ-proteobacterial species. We describe the pattern of fast or slow evolution across species as the “selective signature” of a gene. Selective signatures represent a profile of selection across species that is predictive of gene function: pairs of genes with correlated selective signatures are more likely to share the same cellular function, and genes in the same pathway can evolve in concert. For example, glycolysis and phenylalanine metabolism genes evolve rapidly in Idiomarina loihiensis, mirroring an ecological shift in carbon source from sugars to amino acids. In a broader context, our results suggest that the genomic landscape is organized into functional modules even at the level of natural selection, and thus it may be easier than expected to understand the complex evolutionary pressures on a cell

Data analysis issues for allele-specific expression using Illumina's GoldenGate assay.

Author: A Gimelbrant
AC Tan
Antigone S Dimas
AS Dimas
BE Stranger
BJ Main
C Daelemans
Caroline Daelemans
D Serre
Emmanouil T Dermitzakis
GK Smyth
GK Smyth
GK Smyth
HS Lo
HT Bjornsson
International HapMap Consortium
International HapMap Consortium
J Oosting
J Staaf
JB Fan
JC Knight
K Zhang
KB Meyer
KK Dobbin
Matthew E Ritchie
Matthew S Forrest
ME Ritchie
MJ Dunning
MJ Dunning
ML Martin-Magniette
MP Lee
Panagiotis Deloukas
PH van Bilsen
PR Buckland
PV Pant
R Development Core Team
S Davis
Simon Tavaré
X Feng
Publication venue: BMC Bioinformatics
Publication date: 01/01/2010
Field of study

BACKGROUND: High-throughput measurement of allele-specific expression (ASE) is a relatively new and exciting application area for array-based technologies. In this paper, we explore several data sets which make use of Illumina's GoldenGate BeadArray technology to measure ASE. This platform exploits coding SNPs to obtain relative expression measurements for alleles at approximately 1500 positions in the genome. RESULTS: We analyze data from a mixture experiment where genomic DNA samples from pairs of individuals of known genotypes are pooled to create allelic imbalances at varying levels for the majority of SNPs on the array. We observe that GoldenGate has less sensitivity at detecting subtle allelic imbalances (around 1.3 fold) compared to extreme imbalances, and note the benefit of applying local background correction to the data. Analysis of data from a dye-swap control experiment allowed us to quantify dye-bias, which can be reduced considerably by careful normalization. The need to filter the data before carrying out further downstream analysis to remove non-responding probes, which show either weak, or non-specific signal for each allele, was also demonstrated. Throughout this paper, we find that a linear model analysis of the data from each SNP is a flexible modelling strategy that allows for testing of allelic imbalances in each sample when replicate hybridizations are available. CONCLUSIONS: Our analysis shows that local background correction carried out by Illumina's software, together with quantile normalization of the red and green channels within each array, provides optimal performance in terms of false positive rates. In addition, we strongly encourage intensity-based filtering to remove SNPs which only measure non-specific signal. We anticipate that a similar analysis strategy will prove useful when quantifying ASE on Illumina's higher density Infinium BeadChips.RIGHTS : This article is licensed under the BioMed Central licence at http://www.biomedcentral.com/about/license which is similar to the 'Creative Commons Attribution Licence'. In brief you may : copy, distribute, and display the work; make derivative works; or make commercial use of the work - under the following conditions: the original author must be given credit; for any reuse or distribution, it must be made clear to others what the license terms of this work are

Springer - Publisher Connector