42 research outputs found

    Learning High-Order Relations for Network-Based Phenome-Genome Association Analysis

    Get PDF
    University of Minnesota Ph.D. dissertation. August 2019. Major: Computer Science. Advisor: Rui Kuang. 1 computer file (PDF); ix, 96 pages.An organism's phenome is the expression of characteristics from genetic inheritance and interaction with the environment. This includes simple physical appearance and traits, and even complex diseases. In human, the understanding of the relationship of such features with genetic markers gives insights into the mechanisms involved in the expression, and can also help to design targeted therapies and new drugs. In other species, such as plants, correlation of phenotypes with genetic mutations and geoclimatic variables also assists in the understanding of evolutionary global diversity and important characteristics such as flowering time. In this thesis, we propose to use high-order machine learning methods to help in the analysis of phenome through the associations with biological networks and ontologies. We show that, by combining biological networks with functional annotation of genes, we can extract high-order relations to improve the discovery of new candidate associations between genes and phenotypes. We also propose to detect high-order relations among multiple genomics datasets, geoclimatic features, and interactions among genes, to find a feature representation that can be utilized to successfully predict phenotypes. Experiments using the Arabidopsis thaliana species shows that our approach does not only contribute with an accurate predictive tool, but also brings an intuitive alternative for the analysis of correlation among plant accessions, genetic markers, and geoclimatic variables. Finally, we propose a scalable approach to solve challenges inherited from the use of massive biological networks in phenome analysis. Our low-rank method can be used to process massive networks in parallel computing to enable large-scale prior knowledge to be incorporated and improve predictive power.Petegrosso, Raphael. (2019). Learning High-Order Relations for Network-Based Phenome-Genome Association Analysis. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/209128

    Ausência do silêncio na contemporaneidade

    Get PDF
    Em tempos de hiperconectividade, a forma como a informação é disponibilizada e a adequação dos formatos para que sua rápida propagação seja possível, deixa a comunicação cada vez mais imagética e ruidosa. Isso, para que seja possível a assimilação do máximo de conteúdo no menor tempo, o que impõe uma privação ao sujeito de momentos de silêncio, que são necessários à sua subjetividade. O silêncio é, culturalmente, negativo, como um agente da censura que barra a palavra. Na sociedade atual, ruidosa e ensurdecedora, o silêncio atormenta, constrange, enlouquece e oprime. Essa sociedade estabelece o excesso da comunicação como valor supremo, implicando assim que o silêncio continue a ser significado como negativo e deliberadamente evitado. O encontro entre o ser faltante, explicado por Lacan, e as redes sociais, pensando a construção do sujeito à luz do grande Outro, tem colocado em cheque alteridades anteriormente fundantes desse sujeito. É o estabelecimento de outro Outro, aqui representado pelas redes sociais. O sujeito de hoje definido por um novo conjunto simbólico, que lhe confere sua condição

    Hierarchical Canonical Correlation Analysis Reveals Phenotype, Genotype, and Geoclimate Associations in Plants

    No full text
    The local environment of the geographical origin of plants shaped their genetic variations through environmental adaptation. While the characteristics of the local environment correlate with the genotypes and other genomic features of the plants, they can also be indicative of genotype-phenotype associations providing additional information relevant to environmental dependence. In this study, we investigate how the geoclimatic features from the geographical origin of the Arabidopsis thaliana accessions can be integrated with genomic features for phenotype prediction and association analysis using advanced canonical correlation analysis (CCA). In particular, we propose a novel method called hierarchical canonical correlation analysis (HCCA) to combine mutations, gene expressions, and DNA methylations with geoclimatic features for informative coprojections of the features. HCCA uses a condition number of the cross-covariance between pairs of datasets to infer a hierarchical structure for applying CCA to combine the data. In the experiments on Arabidopsis thaliana data from 1001 Genomes and 1001 Epigenomes projects and climatic, atmospheric, and soil environmental variables combined by CLIMtools, HCCA provided a joint representation of the genomic data and geoclimate data for better prediction of the special flowering time at 10°C (FT10) of Arabidopsis thaliana . We also extended HCCA with information from a protein-protein interaction (PPI) network to guide the feature learning by imposing network modules onto the genomic features, which are shown to be useful for identifying genes with more coherent functions correlated with the geoclimatic features. The findings in this study suggest that environmental data comprise an important component in plant phenotype analysis. HCCA is a useful data integration technique for phenotype prediction, and a better understanding of the interactions between gene functions and environment as more useful functional information is introduced by coprojections of multiple genomic datasets. </jats:p

    Hierarchical Canonical Correlation Analysis Reveals Phenotype, Genotype, and Geoclimate Associations in Plants

    No full text
    The local environment of the geographical origin of plants shaped their genetic variations through environmental adaptation. While the characteristics of the local environment correlate with the genotypes and other genomic features of the plants, they can also be indicative of genotype-phenotype associations providing additional information relevant to environmental dependence. In this study, we investigate how the geoclimatic features from the geographical origin of the Arabidopsis thaliana accessions can be integrated with genomic features for phenotype prediction and association analysis using advanced canonical correlation analysis (CCA). In particular, we propose a novel method called hierarchical canonical correlation analysis (HCCA) to combine mutations, gene expressions, and DNA methylations with geoclimatic features for informative coprojections of the features. HCCA uses a condition number of the cross-covariance between pairs of datasets to infer a hierarchical structure for applying CCA to combine the data. In the experiments on Arabidopsis thaliana data from 1001 Genomes and 1001 Epigenomes projects and climatic, atmospheric, and soil environmental variables combined by CLIMtools, HCCA provided a joint representation of the genomic data and geoclimate data for better prediction of the special flowering time at 10°C (FT10) of Arabidopsis thaliana. We also extended HCCA with information from a protein-protein interaction (PPI) network to guide the feature learning by imposing network modules onto the genomic features, which are shown to be useful for identifying genes with more coherent functions correlated with the geoclimatic features. The findings in this study suggest that environmental data comprise an important component in plant phenotype analysis. HCCA is a useful data integration technique for phenotype prediction, and a better understanding of the interactions between gene functions and environment as more useful functional information is introduced by coprojections of multiple genomic datasets

    Ausência do silêncio na contemporaneidade

    Full text link
    Em tempos de hiperconectividade, a forma como a informação é disponibilizada e a adequação dos formatos para que sua rápida propagação seja possível, deixa a comunicação cada vez mais imagética e ruidosa. Isso, para que seja possível a assimilação do máximo de conteúdo no menor tempo, o que impõe uma privação ao sujeito de momentos de silêncio, que são necessários à sua subjetividade. O silêncio é, culturalmente, negativo, como um agente da censura que barra a palavra. Na sociedade atual, ruidosa e ensurdecedora, o silêncio atormenta, constrange, enlouquece e oprime. Essa sociedade estabelece o excesso da comunicação como valor supremo, implicando assim que o silêncio continue a ser significado como negativo e deliberadamente evitado. O encontro entre o ser faltante, explicado por Lacan, e as redes sociais, pensando a construção do sujeito à luz do grande Outro, tem colocado em cheque alteridades anteriormente fundantes desse sujeito. É o estabelecimento de outro Outro, aqui representado pelas redes sociais. O sujeito de hoje definido por um novo conjunto simbólico, que lhe confere sua condição.</jats:p

    Machine learning and statistical methods for clustering single-cell RNA-sequencing data

    Full text link
    Abstract   Single-cell RNAsequencing (scRNA-seq) technologies have enabled the large-scale whole-transcriptome profiling of each individual single cell in a cell population. A core analysis of the scRNA-seq transcriptome profiles is to cluster the single cells to reveal cell subtypes and infer cell lineages based on the relations among the cells. This article reviews the machine learning and statistical methods for clustering scRNA-seq transcriptomes developed in the past few years. The review focuses on how conventional clustering techniques such as hierarchical clustering, graph-based clustering, mixture models, kk-means, ensemble learning, neural networks and density-based clustering are modified or customized to tackle the unique challenges in scRNA-seq data analysis, such as the dropout of low-expression genes, low and uneven read coverage of transcripts, highly variable total mRNAs from single cells and ambiguous cell markers in the presence of technical biases and irrelevant confounding biological variations. We review how cell-specific normalization, the imputation of dropouts and dimension reduction methods can be applied with new statistical or optimization strategies to improve the clustering of single cells. We will also introduce those more advanced approaches to cluster scRNA-seq transcriptomes in time series data and multiple cell populations and to detect rare cell types. Several software packages developed to support the cluster analysis of scRNA-seq data are also reviewed and experimentally compared to evaluate their performance and efficiency. Finally, we conclude with useful observations and possible future directions in scRNA-seq data analytics. Availability All the source code and data are available at https://github.com/kuanglab/single-cell-review. </jats:sec

    Transfer learning across ontologies for phenome–genome association prediction

    Full text link
    Abstract Motivation To better predict and analyze gene associations with the collection of phenotypes organized in a phenotype ontology, it is crucial to effectively model the hierarchical structure among the phenotypes in the ontology and leverage the sparse known associations with additional training information. In this paper, we first introduce Dual Label Propagation (DLP) to impose consistent associations with the entire phenotype paths in predicting phenotype–gene associations in Human Phenotype Ontology (HPO). DLP is then used as the base model in a transfer learning framework (tlDLP) to incorporate functional annotations in Gene Ontology (GO). By simultaneously reconstructing GO term–gene associations and HPO phenotype–gene associations for all the genes in a protein–protein interaction network, tlDLP benefits from the enriched training associations indirectly through relation with GO terms. Results In the experiments to predict the associations between human genes and phenotypes in HPO based on human protein–protein interaction network, both DLP and tlDLP improved the prediction of gene associations with phenotype paths in HPO in cross-validation and the prediction of the most recent associations added after the snapshot of the training data. Moreover, the transfer learning through GO term–gene associations significantly improved association predictions for the phenotypes with no more specific known associations by a large margin. Examples are also shown to demonstrate how phenotype paths in phenotype ontology and transfer learning with gene ontology can improve the predictions. Availability and Implementation Source code is available at http://compbio.cs.umn.edu/ontophenome. Supplementary information Supplementary data are available at Bioinformatics online. </jats:sec
    corecore