102 research outputs found

    Computational algorithms to predict Gene Ontology annotations

    Get PDF
    Background Gene function annotations, which are associations between a gene and a term of a controlled vocabulary describing gene functional features, are of paramount importance in modern biology. Datasets of these annotations, such as the ones provided by the Gene Ontology Consortium, are used to design novel biological experiments and interpret their results. Despite their importance, these sources of information have some known issues. They are incomplete, since biological knowledge is far from being definitive and it rapidly evolves, and some erroneous annotations may be present. Since the curation process of novel annotations is a costly procedure, both in economical and time terms, computational tools that can reliably predict likely annotations, and thus quicken the discovery of new gene annotations, are very useful. Methods We used a set of computational algorithms and weighting schemes to infer novel gene annotations from a set of known ones. We used the latent semantic analysis approach, implementing two popular algorithms (Latent Semantic Indexing and Probabilistic Latent Semantic Analysis) and propose a novel method, the Semantic IMproved Latent Semantic Analysis, which adds a clustering step on the set of considered genes. Furthermore, we propose the improvement of these algorithms by weighting the annotations in the input set. Results We tested our methods and their weighted variants on the Gene Ontology annotation sets of three model organism genes (Bos taurus, Danio rerio and Drosophila melanogaster ). The methods showed their ability in predicting novel gene annotations and the weighting procedures demonstrated to lead to a valuable improvement, although the obtained results vary according to the dimension of the input annotation set and the considered algorithm. Conclusions Out of the three considered methods, the Semantic IMproved Latent Semantic Analysis is the one that provides better results. In particular, when coupled with a proper weighting policy, it is able to predict a significant number of novel annotations, demonstrating to actually be a helpful tool in supporting scientists in the curation process of gene functional annotations

    General Adaptive Neighborhood Image Restoration, Enhancement and Segmentation

    Get PDF
    12 pagesInternational audienceThis paper aims to outline the General Adaptive Neighborhood Image Processing (GANIP) approach [1–3], which has been recently introduced. An intensity image is represented with a set of local neighborhoods defined for each point of the image to be studied. These so-called General Adaptive Neighborhoods (GANs) are simultaneously adaptive with the spatial structures, the analyzing scales and the physical settings of the image to be addressed and/or the human visual system. After a brief theoretical introductory survey, the GANIP approach will be successfully applied on real application examples in image restoration, enhancement and segmentation

    GenoMetric Query Language: A novel approach to large-scale genomic data management

    Get PDF
    Motivation: Improvement of sequencing technologies and data processing pipelines is rapidly providing sequencing data, with associated high-level features, of many individual genomes in multiple biological and clinical conditions. They allow for data-driven genomic, transcriptomic and epigenomic characterizations, but require state-of-the-art ‘big data’ computing strategies, with abstraction levels beyond available tool capabilities. Results: We propose a high-level, declarative GenoMetric Query Language (GMQL) and a toolkit for its use. GMQL operates downstream of raw data preprocessing pipelines and supports queries over thousands of heterogeneous datasets and samples; as such it is key to genomic ‘big data’ analysis. GMQL leverages a simple data model that provides both abstractions of genomic region data and associated experimental, biological and clinical metadata and interoperability between many data formats. Based on Hadoop framework and Apache Pig platform, GMQL ensures high scalability, expressivity, flexibility and simplicity of use, as demonstrated by several biological query examples on ENCODE and TCGA datasets. Availability and implementation: The GMQL toolkit is freely available for non-commercial use at http://www.bioinformatics.deib.polimi.it/GMQL/. Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online

    DeepBrain: Functional Representation of Neural In-Situ Hybridization Images for Gene Ontology Classification Using Deep Convolutional Autoencoders

    Full text link
    This paper presents a novel deep learning-based method for learning a functional representation of mammalian neural images. The method uses a deep convolutional denoising autoencoder (CDAE) for generating an invariant, compact representation of in situ hybridization (ISH) images. While most existing methods for bio-imaging analysis were not developed to handle images with highly complex anatomical structures, the results presented in this paper show that functional representation extracted by CDAE can help learn features of functional gene ontology categories for their classification in a highly accurate manner. Using this CDAE representation, our method outperforms the previous state-of-the-art classification rate, by improving the average AUC from 0.92 to 0.98, i.e., achieving 75% reduction in error. The method operates on input images that were downsampled significantly with respect to the original ones to make it computationally feasible

    Physical justifications and applications of the LIP model for the processing of transmitted light images

    Get PDF
    The Logarithmic Image Processing (LIP) mode{ is a mathematical framework which provides a special set of algebraic and functiona l operations for the processing of non-linear images and signals . In this paper, the initial ideas and some notions of the LIP mode l are firstly introduced . Then, it is shown that the physical absorption laws of monochromatic and panchromatic light waves ma y be expressed within this mathematical framework . The connections of the LIP model with several important physical characteristic s of transmitted light images are exposed . Finally, the effectiveness of the LIP model is illustrated in four image processing areas : illumination correction, background removing, dynamic range stabilization and control .Le modèle UP (Logarithmic Image Processing) est un cadre mathématique qui fournit un ensemble spécifique d'opérations algébriques et fonctionnelles pour le traitement d'images et de signaux non-linéaires. Dans cet article, les idées initiales et quelques notions du modèle LIP sont d'abord introduites. Ensuite, il est montré que les lois physiques d'absorption des ondes lumineuses monochromatiques et panchromatiques s'expriment dans ce cadre mathématique. Les relations des opérations de base du modèle LIP avec plusieurs caractéristiques importantes des images obtenues en lumière transmise sont alors exposées. Finalement, l'efficacité du modèle LIP est illustrée dans quatre domaines du traitement d'image: la correction de dérive d'éclairement, la suppression de fond, la stabilisation et le contrôle de dynamique

    Non-Negative Matrix Tri-Factorization for Representation Learning in Multi-Omics Datasets with Applications to Drug Repurposing and Selection

    Get PDF
    The vast corpus of heterogeneous biomedical data stored in databases, ontologies, and terminologies presents a unique opportunity for drug design. Integrating and fusing these sources is essential to develop data representations that can be analyzed using artificial intelligence methods to generate novel drug candidates or hypotheses. Here, we propose Non-Negative Matrix Tri-Factorization as an invaluable tool for integrating and fusing data, as well as for representation learning. Additionally, we demonstrate how representations learned by Non-Negative Matrix Tri-Factorization can effectively be utilized by traditional artificial intelligence methods. While this approach is domain-agnostic and applicable to any field with vast amounts of structured and semi-structured data, we apply it specifically to computational pharmacology and drug repurposing. This field is poised to benefit significantly from artificial intelligence, particularly in personalized medicine. We conducted extensive experiments to evaluate the performance of the proposed method, yielding exciting results, particularly compared to traditional methods. Novel drug-target predictions have also been validated in the literature, further confirming their validity. Additionally, we tested our method to predict drug synergism, where constructing a classical matrix dataset is challenging. The method demonstrated great flexibility, suggesting its applicability to a wide range of tasks in drug design and discovery

    Employing a systematic approach to biobanking and analyzing clinical and genetic data for advancing COVID-19 research

    Get PDF
    corecore