4,207 research outputs found
An expanded evaluation of protein function prediction methods shows an improvement in accuracy
Background A major bottleneck in our understanding of the molecular
underpinnings of life is the assignment of function to proteins. While
molecular experiments provide the most reliable annotation of proteins, their
relatively low throughput and restricted purview have led to an increasing
role for computational function prediction. However, assessing methods for
protein function prediction and tracking progress in the field remain
challenging. Results We conducted the second critical assessment of functional
annotation (CAFA), a timed challenge to assess computational methods that
automatically assign protein function. We evaluated 126 methods from 56
research groups for their ability to predict biological functions using Gene
Ontology and gene-disease associations using Human Phenotype Ontology on a set
of 3681 proteins from 18 species. CAFA2 featured expanded analysis compared
with CAFA1, with regards to data set size, variety, and assessment metrics. To
review progress in the field, the analysis compared the best methods from
CAFA1 to those of CAFA2. Conclusions The top-performing methods in CAFA2
outperformed those from CAFA1. This increased accuracy can be attributed to a
combination of the growing number of experimental annotations and improved
methods for function prediction. The assessment also revealed that the
definition of top-performing algorithms is ontology specific, that different
performance metrics can be used to probe the nature of accurate predictions,
and the relative diversity of predictions in the biological process and human
phenotype ontologies. While there was methodological improvement between CAFA1
and CAFA2, the interpretation of results and usefulness of individual methods
remain context-dependent
Prediction of Human Phenotype Ontology terms by means of hierarchical ensemble methods
Background The prediction of human gene–abnormal phenotype associations is a
fundamental step toward the discovery of novel genes associated with human
disorders, especially when no genes are known to be associated with a specific
disease. In this context the Human Phenotype Ontology (HPO) provides a
standard categorization of the abnormalities associated with human diseases.
While the problem of the prediction of gene–disease associations has been
widely investigated, the related problem of gene–phenotypic feature (i.e., HPO
term) associations has been largely overlooked, even if for most human genes
no HPO term associations are known and despite the increasing application of
the HPO to relevant medical problems. Moreover most of the methods proposed in
literature are not able to capture the hierarchical relationships between HPO
terms, thus resulting in inconsistent and relatively inaccurate predictions.
Results We present two hierarchical ensemble methods that we formally prove to
provide biologically consistent predictions according to the hierarchical
structure of the HPO. The modular structure of the proposed methods, that
consists in a “flat” learning first step and a hierarchical combination of the
predictions in the second step, allows the predictions of virtually any flat
learning method to be enhanced. The experimental results show that
hierarchical ensemble methods are able to predict novel associations between
genes and abnormal phenotypes with results that are competitive with state-of-
the-art algorithms and with a significant reduction of the computational
complexity. Conclusions Hierarchical ensembles are efficient computational
methods that guarantee biologically meaningful predictions that obey the true
path rule, and can be used as a tool to improve and make consistent the HPO
terms predictions starting from virtually any flat learning method. The
implementation of the proposed methods is available as an R package from the
CRAN repository
Reasoning about goal-directed real-time teleo-reactive programs
The teleo-reactive programming model is a high-level approach to developing real-time systems that supports hierarchical composition and durative actions. The model is different from frameworks such as action systems, timed automata and TLA+, and allows programs to be more compact and descriptive of their intended behaviour. Teleo-reactive programs are particularly useful for implementing controllers for autonomous agents that must react robustly to their dynamically changing environments. In this paper, we develop a real-time logic that is based on Duration Calculus and use this logic to formalise the semantics of teleo-reactive programs. We develop rely/guarantee rules that facilitate reasoning about a program and its environment in a compositional manner. We present several theorems for simplifying proofs of teleo-reactive programs and present a partially mechanised method for proving progress properties of goal-directed agents. © 2013 British Computer Society
a comprehensive and efficient analysis pipeline designed for ChIP-nexus
Background ChIP-nexus, an extension of the ChIP-exo protocol, can be used to
map the borders of protein-bound DNA sequences at nucleotide resolution,
requires less input DNA and enables selective PCR duplicate removal using
random barcodes. However, the use of random barcodes requires additional
preprocessing of the mapping data, which complicates the computational
analysis. To date, only a very limited number of software packages are
available for the analysis of ChIP-exo data, which have not yet been
systematically tested and compared on ChIP-nexus data. Results Here, we
present a comprehensive software package for ChIP-nexus data that exploits the
random barcodes for selective removal of PCR duplicates and for quality
control. Furthermore, we developed bespoke methods to estimate the width of
the protected region resulting from protein-DNA binding and to infer binding
positions from ChIP-nexus data. Finally, we applied our peak calling method as
well as the two other methods MACE and MACS2 to the available ChIP-nexus data.
Conclusions The Q-nexus software is efficient and easy to use. Novel
statistics about duplication rates in consideration of random barcodes are
calculated. Our method for the estimation of the width of the protected region
yields unbiased signatures that are highly reproducible for biological
replicates and at the same time very specific for the respective factors
analyzed. As judged by the irreproducible discovery rate (IDR), our peak
calling algorithm shows a substantially better reproducibility. An
implementation of Q-nexus is available at http://charite.github.io/Q/
the rare bone disorders use case
Background Lately, ontologies have become a fundamental building block in the
process of formalising and storing complex biomedical information. The
community-driven ontology curation process, however, ignores the possibility
of multiple communities building, in parallel, conceptualisations of the same
domain, and thus providing slightly different perspectives on the same
knowledge. The individual nature of this effort leads to the need of a
mechanism to enable us to create an overarching and comprehensive overview of
the different perspectives on the domain knowledge. Results We introduce an
approach that enables the loose integration of knowledge emerging from diverse
sources under a single coherent interoperable resource. To accurately track
the original knowledge statements, we record the provenance at very granular
levels. We exemplify the approach in the rare bone disorders domain by
proposing the Rare Bone Disorders Ontology (RBDO). Using RBDO, researchers are
able to answer queries, such as: “What phenotypes describe a particular
disorder and are common to all sources?” or to understand similarities between
disorders based on divergent groupings (classifications) provided by the
underlying sources
The extinct, giant giraffid Sivatherium giganteum: skeletal reconstruction and body mass estimation
Sivatherium giganteum is an extinct giraffid from the Plio–Pleistocene boundary of the Himalayan foothills. To date, there has been no rigorous skeletal reconstruction of this unusual mammal. Historical and contemporary accounts anecdotally state that Sivatherium rivalled the African elephant in terms of its body mass, but this statement has never been tested. Here, we present a three-dimensional composite skeletal reconstruction and calculate a representative body mass estimate for this species using a volumetric method. We find that the estimated adult body mass of 1246 kg (857—1812 kg range) does not approach that of an African elephant, but confirms that Sivatherium was certainly a large giraffid, and may have been the largest ruminant mammal that has ever existed. We contrast this volumetric estimate with a bivariate scaling estimate derived from Sivatherium's humeral circumference and find that there is a discrepancy between the two. The difference implies that the humeral circumference of Sivatherium is greater than expected for an animal of this size, and we speculate this may be linked to a cranial shift in centre of mass
Perspectives on the revised Ghent criteria for the diagnosis of Marfan syndrome
Three international nosologies have been proposed for the diagnosis of Marfan syndrome (MFS): the Berlin nosology in 1988; the Ghent nosology in 1996 (Ghent-1); and the revised Ghent nosology in 2010 (Ghent-2). We reviewed the literature and discussed the challenges and concepts of diagnosing MFS in adults. Ghent-1 proposed more stringent clinical criteria, which led to the confirmation of MFS in only 32%-53% of patients formerly diagnosed with MFS according to the Berlin nosology. Conversely, both the Ghent-1 and Ghent-2 nosologies diagnosed MFS, and both yielded similar frequencies of MFS in persons with a causative FBN1 mutation (90% for Ghent-1 versus 92% for Ghent-2) and in persons not having a causative FBN1 mutation (15% versus 13%). Quality criteria for diagnostic methods include objectivity, reliability, and validity. However, the nosology-based diagnosis of MFS lacks a diagnostic reference standard and, hence, quality criteria such as sensitivity, specificity, or accuracy cannot be assessed. Medical utility of diagnosis implies congruency with the historical criteria of MFS, as well as with information about the etiology, pathogenesis, diagnostic triggers, prognostic triggers, and potential complications of MFS. In addition, social and psychological utilities of diagnostic criteria include acceptance by patients, patient organizations, clinicians and scientists, practicability, costs, and the reduction of anxiety. Since the utility of a diagnosis or exclusion of MFS is context-dependent, prioritization of utilities is a strategic decision in the process of nosology development. Screening tests for MFS should be used to identify persons with MFS. To confirm the diagnosis of MFS, Ghent-1 and Ghent-2 perform similarly, but Ghent-2 is easier to use. To maximize the utility of the diagnostic criteria of MFS, a fair and transparent process of nosology development is essential
Possible germ cell-Sertoli cell interactions are critical for establishing appropriate expression levels for the Sertoli cell-specific MicroRNA, miR-202-5p, in human testis
A Novel Genome-Wide Association Study Approach Using Genotyping by Exome Sequencing Leads to the Identification of a Primary Open Angle Glaucoma Associated Inversion Disrupting ADAMTS17
Closed breeding populations in the dog in conjunction with advances in gene mapping and sequencing techniques facilitate mapping of autosomal recessive diseases and identification of novel disease-causing variants, often using unorthodox experimental designs. In our investigation we demonstrate successful mapping of the locus for primary open angle glaucoma in the Petit Basset Griffon Vendéen dog breed with 12 cases and 12 controls, using a novel genotyping by exome sequencing approach. The resulting genome-wide association signal was followed up by genome sequencing of an individual case, leading to the identification of an inversion with a breakpoint disrupting the ADAMTS17 gene. Genotyping of additional controls and expression analysis provide strong evidence that the inversion is disease causing. Evidence of cryptic splicing resulting in novel exon transcription as a consequence of the inversion in ADAMTS17 is identified through RNAseq experiments. This investigation demonstrates how a novel genotyping by exome sequencing approach can be used to map an autosomal recessive disorder in the dog, with the use of genome sequencing to facilitate identification of a disease-associated variant
Isomerization dynamics of a buckled nanobeam
We analyze the dynamics of a model of a nanobeam under compression. The model
is a two mode truncation of the Euler-Bernoulli beam equation subject to
compressive stress. We consider parameter regimes where the first mode is
unstable and the second mode can be either stable or unstable, and the
remaining modes (neglected) are always stable. Material parameters used
correspond to silicon. The two mode model Hamiltonian is the sum of a
(diagonal) kinetic energy term and a potential energy term. The form of the
potential energy function suggests an analogy with isomerisation reactions in
chemistry. We therefore study the dynamics of the buckled beam using the
conceptual framework established for the theory of isomerisation reactions.
When the second mode is stable the potential energy surface has an index one
saddle and when the second mode is unstable the potential energy surface has an
index two saddle and two index one saddles. Symmetry of the system allows us to
construct a phase space dividing surface between the two "isomers" (buckled
states). The energy range is sufficiently wide that we can treat the effects of
the index one and index two saddles in a unified fashion. We have computed
reactive fluxes, mean gap times and reactant phase space volumes for three
stress values at several different energies. In all cases the phase space
volume swept out by isomerizing trajectories is considerably less than the
reactant density of states, proving that the dynamics is highly nonergodic. The
associated gap time distributions consist of one or more `pulses' of
trajectories. Computation of the reactive flux correlation function shows no
sign of a plateau region; rather, the flux exhibits oscillatory decay,
indicating that, for the 2-mode model in the physical regime considered, a rate
constant for isomerization does not exist.Comment: 42 pages, 6 figure
- …
