106 research outputs found

    Model Selection Approach Suggests Causal Association between 25-Hydroxyvitamin D and Colorectal Cancer

    Get PDF
    Vitamin D deficiency has been associated with increased risk of colorectal cancer (CRC), but causal relationship has not yet been confirmed. We investigate the direction of causation between vitamin D and CRC by extending the conventional approaches to allow pleiotropic relationships and by explicitly modelling unmeasured confounders.Plasma 25-hydroxyvitamin D (25-OHD), genetic variants associated with 25-OHD and CRC, and other relevant information was available for 2645 individuals (1057 CRC cases and 1588 controls) and included in the model. We investigate whether 25-OHD is likely to be causally associated with CRC, or vice versa, by selecting the best modelling hypothesis according to Bayesian predictive scores. We examine consistency for a range of prior assumptions.Model comparison showed preference for the causal association between low 25-OHD and CRC over the reverse causal hypothesis. This was confirmed for posterior mean deviances obtained for both models (11.5 natural log units in favour of the causal model), and also for deviance information criteria (DIC) computed for a range of prior distributions. Overall, models ignoring hidden confounding or pleiotropy had significantly poorer DIC scores.Results suggest causal association between 25-OHD and colorectal cancer, and support the need for randomised clinical trials for further confirmations

    Automated pathway and reaction prediction facilitates in silico identification of unknown metabolites in human cohort studies

    Get PDF
    Identification of metabolites in non-targeted metabolomics continues to be a bottleneck in metabolomics studies in large human cohorts. Unidentified metabolites frequently emerge in the results of association studies linking metabolite levels to, for example, clinical phenotypes. For further analyses these unknown metabolites must be identified. Current approaches utilize chemical information, such as spectral details and fragmentation characteristics to determine components of unknown metabolites. Here, we propose a systems biology model exploiting the internal correlation structure of metabolite levels in combination with existing biochemical and genetic information to characterize properties of unknown molecules. Levels of 758 metabolites (439 known, 319 unknown) in human blood samples of 2279 subjects were measured using a non-targeted metabolomics platform (LC-MS and GC-MS). We reconstructed the structure of biochemical pathways that are imprinted in these metabolomics data by building an empirical network model based on 1040 significant partial correlations between metabolites. We further added associations of these metabolites to 134 genes from genome-wide association studies as well as reactions and functional relations to genes from the public database Recon 2 to the network model. From the local neighborhood in the network, we were able to predict the pathway annotation of 180 unknown metabolites. Furthermore, we classified 100 pairs of known and unknown and 45 pairs of unknown metabolites to 21 types of reactions based on their mass differences. As a proof of concept, we then looked further into the special case of predicted dehydrogenation reactions leading us to the selection of 39 candidate molecules for 5 unknown metabolites. Finally, we could verify 2 of those candidates by applying LC-MS analyses of commercially available candidate substances. The formerly unknown metabolites X-13891 and X-13069 were shown to be 2-dodecendioic acid and 9-tetradecenoic acid, respectively. Our data-driven approach based on measured metabolite levels and genetic associations as well as information from public resources can be used alone or together with methods utilizing spectral patterns as a complementary, automated and powerful method to characterize unknown metabolites

    Kernel multi-task learning using task-specific features

    Get PDF
    In this paper we are concerned with multitask learning when task-specific features are available. We describe two ways of achieving this using Gaussian process predictors: in the first method, the data from all tasks is combined into one dataset, making use of the task-specific features. In the second method we train specific predictors for each reference task, and then combine their predictions using a gating network. We demonstrate these methods on a compiler performance prediction problem, where a task is defined as predicting the speed-up obtained when applying a sequence of code transformations to a given program

    Kernelized Infomax Clustering

    Get PDF
    We propose a simple information-theoretic clustering approach based on maximizing the mutual information I(\sfx,y) between the unknown cluster labels yy and the training patterns \sfx with respect to parameters of specifically constrained encoding distributions. The constraints are chosen such that patterns are likely to be clustered similarly if they lie close to specific (unknown) vectors in the feature space. The method may be conveniently applied to learning the optimal affinity matrix, which corresponds to learning parameters of the kernelized encoder. The procedure does not require computations of eigenvalues or inverses of the Gram matrices, which makes it potentially attractive for clustering large data sets

    Variational Information Maximization in Gaussian Channels

    Get PDF
    Recently, we introduced a simple variational bound on mutual information, that resolves some of the difficulties in the application of information theory to machine learning. Here we study a specific application to Gaussian channels. It is well known that PCA may be viewed as the solution to maximizing information transmission between a high dimensional vector and its low dimensional representation . However, such results are based on assumptions of Gaussianity of the sources. In this paper, we show how our mutual information bound, when applied to this arena, gives PCA solutions, without the need for the Gaussian assumption. Furthermore, it naturally generalizes to providing an objective function for Kernel PCA, enabling the principled selection of kernel parameters

    Apolipoprotein CIII and N-terminal prohormone b-type natriuretic peptide as independent predictors for cardiovascular disease in type 2 diabetes

    Get PDF
    Background and aims: Developing sparse panels of biomarkers for cardiovascular disease in type 2 diabetes would enable risk stratification for clinical decision making and selection into clinical trials. We examined the individual and joint performance of five candidate biomarkers for incident cardiovascular disease (CVD) in type 2 diabetes that an earlier discovery study had yielded. Methods: Apolipoprotein CIII (apoCIII), N-terminal prohormone B-type natriuretic peptide (NT-proBNP), high sensitivity Troponin T (hsTnT), Interleukin-6, and Interleukin-15 were measured in baseline serum samples from the Collaborative Atorvastatin Diabetes trial (CARDS) of atorvastatin versus placebo. Among 2105 persons with type 2 diabetes and median age of 62.9 years (range 39.2–77.3), there were 144 incident CVD (acute coronary heart disease or stroke) cases during the maximum 5-year follow up. We used Cox Proportional Hazards models to identify biomarkers associated with incident CVD and the area under the receiver operating characteristic curves (AUROC) to assess overall model prediction. Results: Three of the biomarkers were singly associated with incident CVD independently of other risk factors; NT-proBNP (Hazard Ratio per standardised unit 2.02, 95% Confidence Interval [CI] 1.63, 2.50), apoCIII (1.34, 95% CI 1.12, 1.60) and hsTnT (1.40, 95% CI 1.16, 1.69). When combined in a single model, only NT-proBNP and apoCIII were independent predictors of CVD, together increasing the AUROC using Framingham risk variables from 0.661 to 0.745. Conclusions: The biomarkers NT-proBNP and apoCIII substantially increment the prediction of CVD in type 2 diabetes beyond that obtained with the variables used in the Framingham risk score

    Computational Semantics with Functional Programming, by Jan van Eijck and Christina Unger

    Get PDF
    One of the fundamental tasks of science is to find explainable relationships between observed phenomena. One approach to this task that has received attention in recent years is based on probabilistic graphical modelling with sparsity constraints on model structures. In this paper, we describe two new approaches to Bayesian inference of sparse structures of Gaussian graphical models (GGMs). One is based on a simple modification of the cutting-edge block Gibbs sampler for sparse GGMs, which results in significant computational gains in high dimensions. The other method is based on a specific construction of the Hamiltonian Monte Carlo sampler, which results in further significant improvements. We compare our fully Bayesian approaches with the popular regularisation-based graphical LASSO, and demonstrate significant advantages of the Bayesian treatment under the same computing costs. We apply the methods to a broad range of simulated data sets, and a real-life financial data set

    Kernel multi-task learning using task-specific features

    Get PDF
    In this paper we are concerned with multitask learning when task-specific features are available. We describe two ways of achieving this using Gaussian process predictors: in the first method, the data from all tasks is combined into one dataset, making use of the task-specific features. In the second method we train specific predictors for each reference task, and then combine their predictions using a gating network. We demonstrate these methods on a compiler performance prediction problem, where a task is defined as predicting the speed-up obtained when applying a sequence of code transformations to a given program.
    corecore