35 research outputs found

    LIMIX: genetic analysis of multiple traits

    Get PDF
    Multi-trait mixed models have emerged as a promising approach for joint analyses of multiple traits. In principle, the mixed model framework is remarkably general. However, current methods implement only a very specific range of tasks to optimize the necessary computations. Here, we present a multi-trait modeling framework that is versatile and fast: LIMIX enables to exibly adapt mixed models for a broad range of applications with different observed and hidden covariates, and variable study designs. To highlight the novel modeling aspects of LIMIX we performed three vastly different genetic studies: joint GWAS of correlated blood lipid phenotypes, joint analysis of the expression levels of the multiple transcript-isoforms of a gene, and pathway-based modeling of molecular traits across environments. In these applications we show that LIMIX increases GWAS power and phenotype prediction accuracy, in particular when integrating stepwise multi-locus regression into multi-trait models, and when analyzing large numbers of traits. An open source implementation of LIMIX is freely available at: https://github.com/PMBio/limix

    Expression QTLs Mapping and Analysis: A Bayesian Perspective.

    Get PDF
    The aim of expression Quantitative Trait Locus (eQTL) mapping is the identification of DNA sequence variants that explain variation in gene expression. Given the recent yield of trait-associated genetic variants identified by large-scale genome-wide association analyses (GWAS), eQTL mapping has become a useful tool to understand the functional context where these variants operate and eventually narrow down functional gene targets for disease. Despite its extensive application to complex (polygenic) traits and disease, the majority of eQTL studies still rely on univariate data modeling strategies, i.e., testing for association of all transcript-marker pairs. However these "one at-a-time" strategies are (1) unable to control the number of false-positives when an intricate Linkage Disequilibrium structure is present and (2) are often underpowered to detect the full spectrum of trans-acting regulatory effects. Here we present our viewpoint on the most recent advances on eQTL mapping approaches, with a focus on Bayesian methodology. We review the advantages of the Bayesian approach over frequentist methods and provide an empirical example of polygenic eQTL mapping to illustrate the different properties of frequentist and Bayesian methods. Finally, we discuss how multivariate eQTL mapping approaches have distinctive features with respect to detection of polygenic effects, accuracy, and interpretability of the results

    Modeling the polygenic architecture of complex traits

    No full text
    Die Genomforschung ist innerhalb der letzten Jahre stark gewachsen. Fortschritte in der Sequenzierungstechnologie haben zu einer wahren Flut von genomweiten Daten geführt, die es uns ermöglichen, die genetische Architektur von komplexen Phänotypen detaillierter als jemals zuvor zu untersuchen. Selbst die modernsten Analysemethoden stoßen jedoch an ihre Grenzen, wenn die Effektgrößen zwischen den Markern zu stark schwanken, Störfaktoren die Analyse erschweren, oder die Abhängigkeiten zwischen verwandten Phänotypen ignoriert werden. Das Ziel dieser Arbeit ist es, mehrere Methoden zu entwickeln, die diese Herausforderungen effizient bewältigen können. Unser erster Beitrag ist der LMM-Lasso, ein Hybrid-Modell, das die Vorteile von Variablenselektion mit linearen gemischten Modellen verbindet. Dafür zerlegt er die phänotypische Varianz in zwei Komponenten: die erste besteht aus individuellen genetischen Effekten. Die zweite aus Effekten, die entweder durch Störfaktoren hervorgerufen werden oder zwar genetischer Natur sind, sich aber nicht auf individuelle Marker zurückführen lassen. Der Vorteil unseres Modells ist zum einen, dass die selektierten Koeffizienten leichter zu interpretieren sind als bei etablierte Standardverfahren und zum anderem diese auch an Vorhersagegenauigkeit übertroffen werden. Der zweite Beitrag beschreibt eine kritische Evaluierung verschiedener Lasso- Methoden, die a-priori bekannte strukturelle Informationen über die genetische Marker und den untersuchten Phänotypen benutzen. Wir bewerten die verschiedenen Ansätze auf Grund ihrer Vorhersagegenauigkeit auf simulierten Daten und auf Genexpressionsdaten in Hefe. Beide Experimente zeigen, dass Strukturinformationen nur dann helfen, wenn ihre Annahmen gerechtfertigt sind – sobald die Annahmen verletzt sind, hat die Zuhilfenahme der Strukturinformation den gegenteiligen Effekt. Um dem vorzubeugen, schlagen wir in unserem nächstem Beitrag vor, die Struktur zwischen den Phänotypen aus den Daten zu lernen. Im dritten Beitrag stellen wir ein effizientes Rechenverfahren für Multi-Task Gauss-Prozesse auf, das sowohl die genetische Verwandtschaft zwischen den Phänotypen als auch die Verwandtschaft der Residuen lernt. Unser Inferenzverfahren zeichnet sich durch einen verminderten Laufzeit- und Speicherbedarf aus und ermöglicht uns damit, die gemeinsame Heritabilität von Phänotypen auf großen Datensätzen zu untersuchen. Das Kapitel wird durch zwei Versuchsstudien vervollständigt; einer genomweiten Assoziationsstudie von Arabidopsis thaliana und einer Genexpressionsanalyse in Hefe, die bestätigen dass die neue Methode bessere Vorhersagen liefert. Die Vorteile der gemeinsamen Modellierung von Variablenselektion und Störfaktoren, sowie von Multi-Task Learning, werden in all unseren Versuchsreihen deutlich. Während sich unsere Experimente vor allem auf Anwendungen aus dem Bereich der Genomik konzentrieren, sind die von uns entwickelten Methoden jedoch allgemeingültig und können auch in anderen Feldern Anwendung finden

    LIMIX: genetic analysis of multiple traits

    No full text
    Multi-trait mixed models have emerged as a promising approach for joint analyses of multiple traits. In principle, the mixed model framework is remarkably general. However, current methods implement only a very specific range of tasks to optimize the necessary computations. Here, we present a multi-trait modeling framework that is versatile and fast: LIMIX enables to flexibly adapt mixed models for a broad range of applications with different observed and hidden covariates, and variable study designs. To highlight the novel modeling aspects of LIMIX we performed three vastly different genetic studies: joint GWAS of correlated blood lipid phenotypes, joint analysis of the expression levels of the multiple transcript-isoforms of a gene, and pathway-based modeling of molecular traits across environments. In these applications we show that LIMIX increases GWAS power and phenotype prediction accuracy, in particular when integrating stepwise multi-locus regression into multi-trait models, and when analyzing large numbers of traits. An open source implementation of LIMIX is freely available at: https://github.com/PMBio/limix

    A Lasso multi-marker mixed model for association mapping with population structure correction

    No full text
    Motivation: Exploring the genetic basis of heritable traits remains one of the central challenges in biomedical research. In traits with simple Mendelian architectures, single polymorphic loci explain a significant fraction of the phenotypic variability. However, many traits of interest seem to be subject to multifactorial control by groups of genetic loci. Accurate detection of such multivariate associations is non-trivial and often compromised by limited statistical power. At the same time, confounding influences, such as population structure, cause spurious association signals that result in false-positive findings. Results: We propose linear mixed models LMM-Lasso, a mixed model that allows for both multi-locus mapping and correction for confounding effects. Our approach is simple and free of tuning parameters; it effectively controls for population structure and scales to genome-wide datasets. LMM-Lasso simultaneously discovers likely causal variants and allows for multi-marker-based phenotype prediction from genotype. We demonstrate the practical use of LMM-Lasso in genome-wide association studies in Arabidopsis thaliana and linkage mapping in mouse, where our method achieves significantly more accurate phenotype prediction for 91% of the considered phenotypes. At the same time, our model dissects the phenotypic variability into components that result from individual single nucleotide polymorphism effects and population structure. Enrichment of known candidate genes suggests that the individual associations retrieved by LMM-Lasso are likely to be genuine. Availability: Code available under http://webdav.tuebingen. mpg.de/u/karsten/Forschung/research.html

    It is all in the noise: Efficient multi-task Gaussian process inference with structured residuals

    No full text
    Multi-task prediction models are widely being used to couple regressors or classification models by sharing information across related tasks. A common pitfall of these models is that they assume that the output tasks are independent conditioned on the inputs. Here, we propose a multi-task Gaussian process approach to model both the relatedness between regressors as well as the task correlations in the residuals, in order to more accurately identify true sharing between regressors. The resulting Gaussian model has a covariance term that is the sum of Kronecker products, for which efficient parameter inference and out of sample prediction are feasible. On both synthetic examples and applications to phenotype prediction in genetics, we find substantial benefits of modeling structured noise compared to established alternatives

    It is all in the noise: Efficient multi-task Gaussian process inference with structured residuals

    No full text
    Multi-task prediction methods are widely used to couple regressors or classification models by sharing information across related tasks. We propose a multi-task Gaussian process approach for modeling both the relatedness between regressors and the task correlations in the residuals, in order to more accurately identify true sharing between regressors. The resulting Gaussian model has a covariance term in form of a sum of Kronecker products, for which efficient parameter inference and out of sample prediction are feasible. On both synthetic examples and applications to phenotype prediction in genetics, we find substantial benefits of modeling structured noise compared to established alternatives

    It is all in the noise: efficient multi-task Gaussian process inference with structured residuals

    No full text
    Multi-task prediction methods are widely used to couple regressors or classification models by sharing information across related tasks. We propose a multi-task Gaussian process approach for modeling both the relatedness between regressors and the task correlations in the residuals, in order to more accurately identify true sharing between regressors. The resulting Gaussian model has a covariance term in form of a sum of Kronecker products, for which efficient parameter inference and out of sample prediction are feasible. On both synthetic examples and applications to phenotype prediction in genetics, we find substantial benefits of modeling structured noise compared to established alternatives

    ccSVM: correcting Support Vector Machines for confounding factors in biological data classification

    No full text
    Motivation: Classifying biological data into different groups is a central task of bioinformatics: for instance, to predict the function of a gene or protein, the disease state of a patient or the phenotype of an individual based on its genotype. Support Vector Machines are a wide spread approach for classifying biological data, due to their high accuracy, their ability to deal with structured data such as strings, and the ease to integrate various types of data. However, it is unclear how to correct for confounding factors such as population structure, age or gender or experimental conditions in Support Vector Machine classification
    corecore