561 research outputs found

    Convolutional LSTM Networks for Subcellular Localization of Proteins

    Get PDF
    Machine learning is widely used to analyze biological sequence data. Non-sequential models such as SVMs or feed-forward neural networks are often used although they have no natural way of handling sequences of varying length. Recurrent neural networks such as the long short term memory (LSTM) model on the other hand are designed to handle sequences. In this study we demonstrate that LSTM networks predict the subcellular location of proteins given only the protein sequence with high accuracy (0.902) outperforming current state of the art algorithms. We further improve the performance by introducing convolutional filters and experiment with an attention mechanism which lets the LSTM focus on specific parts of the protein. Lastly we introduce new visualizations of both the convolutional filters and the attention mechanisms and show how they can be used to extract biological relevant knowledge from the LSTM networks

    Algorithm engineering for optimal alignment of protein structure distance matrices

    Get PDF
    Protein structural alignment is an important problem in computational biology. In this paper, we present first successes on provably optimal pairwise alignment of protein inter-residue distance matrices, using the popular Dali scoring function. We introduce the structural alignment problem formally, which enables us to express a variety of scoring functions used in previous work as special cases in a unified framework. Further, we propose the first mathematical model for computing optimal structural alignments based on dense inter-residue distance matrices. We therefore reformulate the problem as a special graph problem and give a tight integer linear programming model. We then present algorithm engineering techniques to handle the huge integer linear programs of real-life distance matrix alignment problems. Applying these techniques, we can compute provably optimal Dali alignments for the very first time

    On combining Big Data and machine learning to support eco-driving behaviours

    Get PDF
    A conscious use of the battery is one of the key elements to consider while driving an electric vehicle. Hence, supporting the drivers, with information about it, can be strategic in letting them drive in a better way, with the purpose of optimizing the energy consumption. In the context of electric vehicles, equipped with regenerative brakes, the driver\u2019s braking style can make a significant difference. In this paper, we propose an approach which is based on the combination of big data and machine learning techniques, with the aim of enhancing the driver\u2019s braking style through visual elements (displayed in the vehicle dashboard, as a Human\u2013Machine Interface), actuating eco-driving behaviours. We have designed and developed a system prototype, by exploiting big data coming from an electric vehicle and a machine learning algorithm. Then, we have conducted a set of tests, with simulated and real data, and here we discuss the results we have obtained that can open interesting discussions about the use of big data, together with machine learning, so as to improve drivers\u2019 awareness of eco-behaviours

    Evaluation of pre-processing on the meta-analysis of DNA methylation data from the Illumina HumanMethylation450 BeadChip platform

    Get PDF
    Introduction Meta-analysis is a powerful means for leveraging the hundreds of experiments being run worldwide into more statistically powerful analyses. This is also true for the analysis of omic data, including genome-wide DNA methylation. In particular, thousands of DNA methylation profiles generated using the Illumina 450k are stored in the publicly accessible Gene Expression Omnibus (GEO) repository. Often, however, the intensity values produced by the BeadChip (raw data) are not deposited, therefore only pre-processed values -obtained after computational manipulation- are available. Pre-processing is possibly different among studies and may then affect meta-analysis by introducing non-biological sources of variability. Material and methods To systematically investigate the effect of pre-processing on meta-analysis, we analysed four different collections of DNA methylation samples (datasets), each composed of two subsets, for which raw data from controls (i.e. healthy subjects) and cases (i.e. patients) are available. We pre-processed the data from each dataset with nine among the most common pipelines found in literature. Moreover, we evaluated the performance of regRCPqn, a modification of the RCP algorithm that aims to improve data consistency. For each combination of pre-processing (9 7 9), we first evaluated the between-sample variability among control subjects and, then, we identified genomic positions that are differentially methylated between cases and controls (differential analysis). Results and conclusion The pre-processing of DNA methylation data affects both the between-sample variability and the loci identified as differentially methylated, and the effects of pre-processing are strongly dataset-dependent. By contrast, application of our renormalization algorithm regRCPqn: (i) reduces variability and (ii) increases agreement between meta-analysed datasets, both critical components of data harmonization

    Defining the Molecular Basis of Tumor Metabolism: a Continuing Challenge Since Warburg's Discovery

    Get PDF
    Cancer cells are the product of genetic disorders that alter crucial intracellular signaling pathways associated with the regulation of cell survival, proliferation, differentiation and death mechanisms. the role of oncogene activation and tumor suppressor inhibition in the onset of cancer is well established. Traditional antitumor therapies target specific molecules, the action/expression of which is altered in cancer cells. However, since the physiology of normal cells involves the same signaling pathways that are disturbed in cancer cells, targeted therapies have to deal with side effects and multidrug resistance, the main causes of therapy failure. Since the pioneering work of Otto Warburg, over 80 years ago, the subversion of normal metabolism displayed by cancer cells has been highlighted by many studies. Recently, the study of tumor metabolism has received much attention because metabolic transformation is a crucial cancer hallmark and a direct consequence of disturbances in the activities of oncogenes and tumor suppressors. in this review we discuss tumor metabolism from the molecular perspective of oncogenes, tumor suppressors and protein signaling pathways relevant to metabolic transformation and tumorigenesis. We also identify the principal unanswered questions surrounding this issue and the attempts to relate these to their potential for future cancer treatment. As will be made clear, tumor metabolism is still only partly understood and the metabolic aspects of transformation constitute a major challenge for science. Nevertheless, cancer metabolism can be exploited to devise novel avenues for the rational treatment of this disease. Copyright (C) 2011 S. Karger AG, BaselFundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)Univ Fed ABC UFABC, CCNH, Santo Andre, BrazilUniversidade Federal de São Paulo UNIFESP, Dept Ciencias Biol, São Paulo, BrazilUniversidade Federal de São Paulo UNIFESP, Dept Bioquim, São Paulo, BrazilUniv Fed Sao Carlos UFSCar, DFQM, Sorocaba, BrazilUniversidade Federal de São Paulo UNIFESP, Dept Ciencias Biol, São Paulo, BrazilUniversidade Federal de São Paulo UNIFESP, Dept Bioquim, São Paulo, BrazilFAPESP: 10/16050-9FAPESP: 10/11475-1FAPESP: 08/51116-0Web of Scienc

    The Pathway Coexpression Network: Revealing pathway relationships.

    Get PDF
    A goal of genomics is to understand the relationships between biological processes. Pathways contribute to functional interplay within biological processes through complex but poorly understood interactions. However, limited functional references for global pathway relationships exist. Pathways from databases such as KEGG and Reactome provide discrete annotations of biological processes. Their relationships are currently either inferred from gene set enrichment within specific experiments, or by simple overlap, linking pathway annotations that have genes in common. Here, we provide a unifying interpretation of functional interaction between pathways by systematically quantifying coexpression between 1,330 canonical pathways from the Molecular Signatures Database (MSigDB) to establish the Pathway Coexpression Network (PCxN). We estimated the correlation between canonical pathways valid in a broad context using a curated collection of 3,207 microarrays from 72 normal human tissues. PCxN accounts for shared genes between annotations to estimate significant correlations between pathways with related functions rather than with similar annotations. We demonstrate that PCxN provides novel insight into mechanisms of complex diseases using an Alzheimer's Disease (AD) case study. PCxN retrieved pathways significantly correlated with an expert curated AD gene list. These pathways have known associations with AD and were significantly enriched for genes independently associated with AD. As a further step, we show how PCxN complements the results of gene set enrichment methods by revealing relationships between enriched pathways, and by identifying additional highly correlated pathways. PCxN revealed that correlated pathways from an AD expression profiling study include functional clusters involved in cell adhesion and oxidative stress. PCxN provides expanded connections to pathways from the extracellular matrix. PCxN provides a powerful new framework for interrogation of global pathway relationships. Comprehensive exploration of PCxN can be performed at http://pcxn.org/
    corecore