4,207 research outputs found

    An expanded evaluation of protein function prediction methods shows an improvement in accuracy

    Get PDF
    Background A major bottleneck in our understanding of the molecular underpinnings of life is the assignment of function to proteins. While molecular experiments provide the most reliable annotation of proteins, their relatively low throughput and restricted purview have led to an increasing role for computational function prediction. However, assessing methods for protein function prediction and tracking progress in the field remain challenging. Results We conducted the second critical assessment of functional annotation (CAFA), a timed challenge to assess computational methods that automatically assign protein function. We evaluated 126 methods from 56 research groups for their ability to predict biological functions using Gene Ontology and gene-disease associations using Human Phenotype Ontology on a set of 3681 proteins from 18 species. CAFA2 featured expanded analysis compared with CAFA1, with regards to data set size, variety, and assessment metrics. To review progress in the field, the analysis compared the best methods from CAFA1 to those of CAFA2. Conclusions The top-performing methods in CAFA2 outperformed those from CAFA1. This increased accuracy can be attributed to a combination of the growing number of experimental annotations and improved methods for function prediction. The assessment also revealed that the definition of top-performing algorithms is ontology specific, that different performance metrics can be used to probe the nature of accurate predictions, and the relative diversity of predictions in the biological process and human phenotype ontologies. While there was methodological improvement between CAFA1 and CAFA2, the interpretation of results and usefulness of individual methods remain context-dependent

    Prediction of Human Phenotype Ontology terms by means of hierarchical ensemble methods

    Get PDF
    Background The prediction of human gene–abnormal phenotype associations is a fundamental step toward the discovery of novel genes associated with human disorders, especially when no genes are known to be associated with a specific disease. In this context the Human Phenotype Ontology (HPO) provides a standard categorization of the abnormalities associated with human diseases. While the problem of the prediction of gene–disease associations has been widely investigated, the related problem of gene–phenotypic feature (i.e., HPO term) associations has been largely overlooked, even if for most human genes no HPO term associations are known and despite the increasing application of the HPO to relevant medical problems. Moreover most of the methods proposed in literature are not able to capture the hierarchical relationships between HPO terms, thus resulting in inconsistent and relatively inaccurate predictions. Results We present two hierarchical ensemble methods that we formally prove to provide biologically consistent predictions according to the hierarchical structure of the HPO. The modular structure of the proposed methods, that consists in a “flat” learning first step and a hierarchical combination of the predictions in the second step, allows the predictions of virtually any flat learning method to be enhanced. The experimental results show that hierarchical ensemble methods are able to predict novel associations between genes and abnormal phenotypes with results that are competitive with state-of- the-art algorithms and with a significant reduction of the computational complexity. Conclusions Hierarchical ensembles are efficient computational methods that guarantee biologically meaningful predictions that obey the true path rule, and can be used as a tool to improve and make consistent the HPO terms predictions starting from virtually any flat learning method. The implementation of the proposed methods is available as an R package from the CRAN repository

    Reasoning about goal-directed real-time teleo-reactive programs

    Get PDF
    The teleo-reactive programming model is a high-level approach to developing real-time systems that supports hierarchical composition and durative actions. The model is different from frameworks such as action systems, timed automata and TLA+, and allows programs to be more compact and descriptive of their intended behaviour. Teleo-reactive programs are particularly useful for implementing controllers for autonomous agents that must react robustly to their dynamically changing environments. In this paper, we develop a real-time logic that is based on Duration Calculus and use this logic to formalise the semantics of teleo-reactive programs. We develop rely/guarantee rules that facilitate reasoning about a program and its environment in a compositional manner. We present several theorems for simplifying proofs of teleo-reactive programs and present a partially mechanised method for proving progress properties of goal-directed agents. © 2013 British Computer Society

    a comprehensive and efficient analysis pipeline designed for ChIP-nexus

    Get PDF
    Background ChIP-nexus, an extension of the ChIP-exo protocol, can be used to map the borders of protein-bound DNA sequences at nucleotide resolution, requires less input DNA and enables selective PCR duplicate removal using random barcodes. However, the use of random barcodes requires additional preprocessing of the mapping data, which complicates the computational analysis. To date, only a very limited number of software packages are available for the analysis of ChIP-exo data, which have not yet been systematically tested and compared on ChIP-nexus data. Results Here, we present a comprehensive software package for ChIP-nexus data that exploits the random barcodes for selective removal of PCR duplicates and for quality control. Furthermore, we developed bespoke methods to estimate the width of the protected region resulting from protein-DNA binding and to infer binding positions from ChIP-nexus data. Finally, we applied our peak calling method as well as the two other methods MACE and MACS2 to the available ChIP-nexus data. Conclusions The Q-nexus software is efficient and easy to use. Novel statistics about duplication rates in consideration of random barcodes are calculated. Our method for the estimation of the width of the protected region yields unbiased signatures that are highly reproducible for biological replicates and at the same time very specific for the respective factors analyzed. As judged by the irreproducible discovery rate (IDR), our peak calling algorithm shows a substantially better reproducibility. An implementation of Q-nexus is available at http://charite.github.io/Q/

    the rare bone disorders use case

    Get PDF
    Background Lately, ontologies have become a fundamental building block in the process of formalising and storing complex biomedical information. The community-driven ontology curation process, however, ignores the possibility of multiple communities building, in parallel, conceptualisations of the same domain, and thus providing slightly different perspectives on the same knowledge. The individual nature of this effort leads to the need of a mechanism to enable us to create an overarching and comprehensive overview of the different perspectives on the domain knowledge. Results We introduce an approach that enables the loose integration of knowledge emerging from diverse sources under a single coherent interoperable resource. To accurately track the original knowledge statements, we record the provenance at very granular levels. We exemplify the approach in the rare bone disorders domain by proposing the Rare Bone Disorders Ontology (RBDO). Using RBDO, researchers are able to answer queries, such as: “What phenotypes describe a particular disorder and are common to all sources?” or to understand similarities between disorders based on divergent groupings (classifications) provided by the underlying sources

    The extinct, giant giraffid Sivatherium giganteum: skeletal reconstruction and body mass estimation

    Get PDF
    Sivatherium giganteum is an extinct giraffid from the Plio–Pleistocene boundary of the Himalayan foothills. To date, there has been no rigorous skeletal reconstruction of this unusual mammal. Historical and contemporary accounts anecdotally state that Sivatherium rivalled the African elephant in terms of its body mass, but this statement has never been tested. Here, we present a three-dimensional composite skeletal reconstruction and calculate a representative body mass estimate for this species using a volumetric method. We find that the estimated adult body mass of 1246 kg (857—1812 kg range) does not approach that of an African elephant, but confirms that Sivatherium was certainly a large giraffid, and may have been the largest ruminant mammal that has ever existed. We contrast this volumetric estimate with a bivariate scaling estimate derived from Sivatherium's humeral circumference and find that there is a discrepancy between the two. The difference implies that the humeral circumference of Sivatherium is greater than expected for an animal of this size, and we speculate this may be linked to a cranial shift in centre of mass

    Perspectives on the revised Ghent criteria for the diagnosis of Marfan syndrome

    Get PDF
    Three international nosologies have been proposed for the diagnosis of Marfan syndrome (MFS): the Berlin nosology in 1988; the Ghent nosology in 1996 (Ghent-1); and the revised Ghent nosology in 2010 (Ghent-2). We reviewed the literature and discussed the challenges and concepts of diagnosing MFS in adults. Ghent-1 proposed more stringent clinical criteria, which led to the confirmation of MFS in only 32%-53% of patients formerly diagnosed with MFS according to the Berlin nosology. Conversely, both the Ghent-1 and Ghent-2 nosologies diagnosed MFS, and both yielded similar frequencies of MFS in persons with a causative FBN1 mutation (90% for Ghent-1 versus 92% for Ghent-2) and in persons not having a causative FBN1 mutation (15% versus 13%). Quality criteria for diagnostic methods include objectivity, reliability, and validity. However, the nosology-based diagnosis of MFS lacks a diagnostic reference standard and, hence, quality criteria such as sensitivity, specificity, or accuracy cannot be assessed. Medical utility of diagnosis implies congruency with the historical criteria of MFS, as well as with information about the etiology, pathogenesis, diagnostic triggers, prognostic triggers, and potential complications of MFS. In addition, social and psychological utilities of diagnostic criteria include acceptance by patients, patient organizations, clinicians and scientists, practicability, costs, and the reduction of anxiety. Since the utility of a diagnosis or exclusion of MFS is context-dependent, prioritization of utilities is a strategic decision in the process of nosology development. Screening tests for MFS should be used to identify persons with MFS. To confirm the diagnosis of MFS, Ghent-1 and Ghent-2 perform similarly, but Ghent-2 is easier to use. To maximize the utility of the diagnostic criteria of MFS, a fair and transparent process of nosology development is essential

    A Novel Genome-Wide Association Study Approach Using Genotyping by Exome Sequencing Leads to the Identification of a Primary Open Angle Glaucoma Associated Inversion Disrupting ADAMTS17

    Get PDF
    Closed breeding populations in the dog in conjunction with advances in gene mapping and sequencing techniques facilitate mapping of autosomal recessive diseases and identification of novel disease-causing variants, often using unorthodox experimental designs. In our investigation we demonstrate successful mapping of the locus for primary open angle glaucoma in the Petit Basset Griffon Vendéen dog breed with 12 cases and 12 controls, using a novel genotyping by exome sequencing approach. The resulting genome-wide association signal was followed up by genome sequencing of an individual case, leading to the identification of an inversion with a breakpoint disrupting the ADAMTS17 gene. Genotyping of additional controls and expression analysis provide strong evidence that the inversion is disease causing. Evidence of cryptic splicing resulting in novel exon transcription as a consequence of the inversion in ADAMTS17 is identified through RNAseq experiments. This investigation demonstrates how a novel genotyping by exome sequencing approach can be used to map an autosomal recessive disorder in the dog, with the use of genome sequencing to facilitate identification of a disease-associated variant

    Isomerization dynamics of a buckled nanobeam

    Full text link
    We analyze the dynamics of a model of a nanobeam under compression. The model is a two mode truncation of the Euler-Bernoulli beam equation subject to compressive stress. We consider parameter regimes where the first mode is unstable and the second mode can be either stable or unstable, and the remaining modes (neglected) are always stable. Material parameters used correspond to silicon. The two mode model Hamiltonian is the sum of a (diagonal) kinetic energy term and a potential energy term. The form of the potential energy function suggests an analogy with isomerisation reactions in chemistry. We therefore study the dynamics of the buckled beam using the conceptual framework established for the theory of isomerisation reactions. When the second mode is stable the potential energy surface has an index one saddle and when the second mode is unstable the potential energy surface has an index two saddle and two index one saddles. Symmetry of the system allows us to construct a phase space dividing surface between the two "isomers" (buckled states). The energy range is sufficiently wide that we can treat the effects of the index one and index two saddles in a unified fashion. We have computed reactive fluxes, mean gap times and reactant phase space volumes for three stress values at several different energies. In all cases the phase space volume swept out by isomerizing trajectories is considerably less than the reactant density of states, proving that the dynamics is highly nonergodic. The associated gap time distributions consist of one or more `pulses' of trajectories. Computation of the reactive flux correlation function shows no sign of a plateau region; rather, the flux exhibits oscillatory decay, indicating that, for the 2-mode model in the physical regime considered, a rate constant for isomerization does not exist.Comment: 42 pages, 6 figure
    corecore