2,664 research outputs found

    Counting, generating and sampling tree alignments

    Get PDF
    Pairwise ordered tree alignment are combinatorial objects that appear in RNA secondary structure comparison. However, the usual representation of tree alignments as supertrees is ambiguous, i.e. two distinct supertrees may induce identical sets of matches between identical pairs of trees. This ambiguity is uninformative, and detrimental to any probabilistic analysis.In this work, we consider tree alignments up to equivalence. Our first result is a precise asymptotic enumeration of tree alignments, obtained from a context-free grammar by mean of basic analytic combinatorics. Our second result focuses on alignments between two given ordered trees SS and TT. By refining our grammar to align specific trees, we obtain a decomposition scheme for the space of alignments, and use it to design an efficient dynamic programming algorithm for sampling alignments under the Gibbs-Boltzmann probability distribution. This generalizes existing tree alignment algorithms, and opens the door for a probabilistic analysis of the space of suboptimal RNA secondary structures alignments.Comment: ALCOB - 3rd International Conference on Algorithms for Computational Biology - 2016, Jun 2016, Trujillo, Spain. 201

    Novel associations for hypothyroidism include known autoimmune risk loci

    Get PDF
    Hypothyroidism is the most common thyroid disorder, affecting about 5% of the general population. Here we present the first large genome-wide association study of hypothyroidism, in 2,564 cases and 24,448 controls from the customer base of 23andMe, Inc., a personal genetics company. We identify four genome-wide significant associations, two of which are well known to be involved with a large spectrum of autoimmune diseases: rs6679677 near _PTPN22_ and rs3184504 in _SH2B3_ (p-values 3.5e-13 and 3.0e-11, respectively). We also report associations with rs4915077 near _VAV3_ (p-value 8.3e-11), another gene involved in immune function, and rs965513 near _FOXE1_ (p-value 3.1e-14). Of these, the association with _PTPN22_ confirms a recent small candidate gene study, and _FOXE1_ was previously known to be associated with thyroid-stimulating hormone (TSH) levels. Although _SH2B3_ has been previously linked with a number of autoimmune diseases, this is the first report of its association with thyroid disease. The _VAV3_ association is novel. These results suggest heterogeneity in the genetic etiology of hypothyroidism, implicating genes involved in both autoimmune disorders and thyroid function. Using a genetic risk profile score based on the top association from each of the four genome-wide significant regions in our study, the relative risk between the highest and lowest deciles of genetic risk is 2.1

    Who Watches the Watchmen? An Appraisal of Benchmarks for Multiple Sequence Alignment

    Get PDF
    Multiple sequence alignment (MSA) is a fundamental and ubiquitous technique in bioinformatics used to infer related residues among biological sequences. Thus alignment accuracy is crucial to a vast range of analyses, often in ways difficult to assess in those analyses. To compare the performance of different aligners and help detect systematic errors in alignments, a number of benchmarking strategies have been pursued. Here we present an overview of the main strategies--based on simulation, consistency, protein structure, and phylogeny--and discuss their different advantages and associated risks. We outline a set of desirable characteristics for effective benchmarking, and evaluate each strategy in light of them. We conclude that there is currently no universally applicable means of benchmarking MSA, and that developers and users of alignment tools should base their choice of benchmark depending on the context of application--with a keen awareness of the assumptions underlying each benchmarking strategy.Comment: Revie

    Release of Lungworm Larvae from Snails in the Environment: Potential for Alternative Transmission Pathways

    Get PDF
    Background: Gastropod-borne parasites may cause debilitating clinical conditions in animals and humans following the consumption of infected intermediate or paratenic hosts. However, the ingestion of fresh vegetables contaminated by snail mucus and/or water has also been proposed as a source of the infection for some zoonotic metastrongyloids (e.g., Angiostrongylus cantonensis). In the meantime, the feline lungworms Aelurostrongylus abstrusus and Troglostrongylus brevior are increasingly spreading among cat populations, along with their gastropod intermediate hosts. The aim of this study was to assess the potential of alternative transmission pathways for A. abstrusus and T. brevior L3 via the mucus of infected Helix aspersa snails and the water where gastropods died. In addition, the histological examination of snail specimens provided information on the larval localization and inflammatory reactions in the intermediate host. Methodology/Principal Findings: Twenty-four specimens of H. aspersa received ~500 L1 of A. abstrusus and T. brevior, and were assigned to six study groups. Snails were subjected to different mechanical and chemical stimuli throughout 20 days in order to elicit the production of mucus. At the end of the study, gastropods were submerged in tap water and the sediment was observed for lungworm larvae for three consecutive days. Finally, snails were artificially digested and recovered larvae were counted and morphologically and molecularly identified. The anatomical localization of A. abstrusus and T. brevior larvae within snail tissues was investigated by histology. L3 were detected in the snail mucus (i.e., 37 A. abstrusus and 19 T. brevior) and in the sediment of submerged specimens (172 A. abstrusus and 39 T. brevior). Following the artificial digestion of H. aspersa snails, a mean number of 127.8 A. abstrusus and 60.3 T. brevior larvae were recovered. The number of snail sections positive for A. abstrusus was higher than those for T. brevior. Conclusions: Results of this study indicate that A. abstrusus and T. brevior infective L3 are shed in the mucus of H. aspersa or in water where infected gastropods had died submerged. Both elimination pathways may represent alternative route(s) of environmental contamination and source of the infection for these nematodes under field conditions and may significantly affect the epidemiology of feline lungworms. Considering that snails may act as intermediate hosts for other metastrongyloid species, the environmental contamination by mucus-released larvae is discussed in a broader context

    RNA secondary structure prediction from multi-aligned sequences

    Full text link
    It has been well accepted that the RNA secondary structures of most functional non-coding RNAs (ncRNAs) are closely related to their functions and are conserved during evolution. Hence, prediction of conserved secondary structures from evolutionarily related sequences is one important task in RNA bioinformatics; the methods are useful not only to further functional analyses of ncRNAs but also to improve the accuracy of secondary structure predictions and to find novel functional RNAs from the genome. In this review, I focus on common secondary structure prediction from a given aligned RNA sequence, in which one secondary structure whose length is equal to that of the input alignment is predicted. I systematically review and classify existing tools and algorithms for the problem, by utilizing the information employed in the tools and by adopting a unified viewpoint based on maximum expected gain (MEG) estimators. I believe that this classification will allow a deeper understanding of each tool and provide users with useful information for selecting tools for common secondary structure predictions.Comment: A preprint of an invited review manuscript that will be published in a chapter of the book `Methods in Molecular Biology'. Note that this version of the manuscript may differ from the published versio

    Inferring stabilizing mutations from protein phylogenies : application to influenza hemagglutinin

    Get PDF
    One selection pressure shaping sequence evolution is the requirement that a protein fold with sufficient stability to perform its biological functions. We present a conceptual framework that explains how this requirement causes the probability that a particular amino acid mutation is fixed during evolution to depend on its effect on protein stability. We mathematically formalize this framework to develop a Bayesian approach for inferring the stability effects of individual mutations from homologous protein sequences of known phylogeny. This approach is able to predict published experimentally measured mutational stability effects (ΔΔG values) with an accuracy that exceeds both a state-of-the-art physicochemical modeling program and the sequence-based consensus approach. As a further test, we use our phylogenetic inference approach to predict stabilizing mutations to influenza hemagglutinin. We introduce these mutations into a temperature-sensitive influenza virus with a defect in its hemagglutinin gene and experimentally demonstrate that some of the mutations allow the virus to grow at higher temperatures. Our work therefore describes a powerful new approach for predicting stabilizing mutations that can be successfully applied even to large, complex proteins such as hemagglutinin. This approach also makes a mathematical link between phylogenetics and experimentally measurable protein properties, potentially paving the way for more accurate analyses of molecular evolution

    Rare coding SNP in DZIP1 gene associated with late-onset sporadic Parkinson's disease

    Get PDF
    We present the first application of the hypothesis-rich mathematical theory to genome-wide association data. The Hamza et al. late-onset sporadic Parkinson's disease genome-wide association study dataset was analyzed. We found a rare, coding, non-synonymous SNP variant in the gene DZIP1 that confers increased susceptibility to Parkinson's disease. The association of DZIP1 with Parkinson's disease is consistent with a Parkinson's disease stem-cell ageing theory.Comment: 14 page

    Evolutionary distances in the twilight zone -- a rational kernel approach

    Get PDF
    Phylogenetic tree reconstruction is traditionally based on multiple sequence alignments (MSAs) and heavily depends on the validity of this information bottleneck. With increasing sequence divergence, the quality of MSAs decays quickly. Alignment-free methods, on the other hand, are based on abstract string comparisons and avoid potential alignment problems. However, in general they are not biologically motivated and ignore our knowledge about the evolution of sequences. Thus, it is still a major open question how to define an evolutionary distance metric between divergent sequences that makes use of indel information and known substitution models without the need for a multiple alignment. Here we propose a new evolutionary distance metric to close this gap. It uses finite-state transducers to create a biologically motivated similarity score which models substitutions and indels, and does not depend on a multiple sequence alignment. The sequence similarity score is defined in analogy to pairwise alignments and additionally has the positive semi-definite property. We describe its derivation and show in simulation studies and real-world examples that it is more accurate in reconstructing phylogenies than competing methods. The result is a new and accurate way of determining evolutionary distances in and beyond the twilight zone of sequence alignments that is suitable for large datasets.Comment: to appear in PLoS ON

    Observation of associated near-side and away-side long-range correlations in √sNN=5.02  TeV proton-lead collisions with the ATLAS detector

    Get PDF
    Two-particle correlations in relative azimuthal angle (Δϕ) and pseudorapidity (Δη) are measured in √sNN=5.02  TeV p+Pb collisions using the ATLAS detector at the LHC. The measurements are performed using approximately 1  μb-1 of data as a function of transverse momentum (pT) and the transverse energy (ΣETPb) summed over 3.1<η<4.9 in the direction of the Pb beam. The correlation function, constructed from charged particles, exhibits a long-range (2<|Δη|<5) “near-side” (Δϕ∼0) correlation that grows rapidly with increasing ΣETPb. A long-range “away-side” (Δϕ∼π) correlation, obtained by subtracting the expected contributions from recoiling dijets and other sources estimated using events with small ΣETPb, is found to match the near-side correlation in magnitude, shape (in Δη and Δϕ) and ΣETPb dependence. The resultant Δϕ correlation is approximately symmetric about π/2, and is consistent with a dominant cos⁡2Δϕ modulation for all ΣETPb ranges and particle pT

    Search for direct pair production of the top squark in all-hadronic final states in proton-proton collisions at s√=8 TeV with the ATLAS detector

    Get PDF
    The results of a search for direct pair production of the scalar partner to the top quark using an integrated luminosity of 20.1fb−1 of proton–proton collision data at √s = 8 TeV recorded with the ATLAS detector at the LHC are reported. The top squark is assumed to decay via t˜→tχ˜01 or t˜→ bχ˜±1 →bW(∗)χ˜01 , where χ˜01 (χ˜±1 ) denotes the lightest neutralino (chargino) in supersymmetric models. The search targets a fully-hadronic final state in events with four or more jets and large missing transverse momentum. No significant excess over the Standard Model background prediction is observed, and exclusion limits are reported in terms of the top squark and neutralino masses and as a function of the branching fraction of t˜ → tχ˜01 . For a branching fraction of 100%, top squark masses in the range 270–645 GeV are excluded for χ˜01 masses below 30 GeV. For a branching fraction of 50% to either t˜ → tχ˜01 or t˜ → bχ˜±1 , and assuming the χ˜±1 mass to be twice the χ˜01 mass, top squark masses in the range 250–550 GeV are excluded for χ˜01 masses below 60 GeV
    corecore