98 research outputs found

    A surrogate function for one-dimensional phylogenetic likelihoods

    Full text link
    Phylogenetics has seen an steady increase in substitution model complexity, which requires increasing amounts of computational power to compute likelihoods. This model complexity motivates strategies to approximate the likelihood functions for branch length optimization and Bayesian sampling. In this paper, we develop an approximation to the one-dimensional likelihood function as parametrized by a single branch length. This new method uses a four-parameter surrogate function abstracted from the simplest phylogenetic likelihood function, the binary symmetric model. We show that it offers a surrogate that can be fit over a variety of branch lengths, that it is applicable to a wide variety of models and trees, and that it can be used effectively as a proposal mechanism for Bayesian sampling. The method is implemented as a stand-alone open-source C library for calling from phylogenetics algorithms; it has proven essential for good performance of our online phylogenetic algorithm sts

    PATRISTIC: a program for calculating patristic distances and graphically comparing the components of genetic change

    Get PDF
    BACKGROUND: Phylogenies are commonly used to analyse the differences between genes, genomes and species. Patristic distances calculated from tree branch lengths describe the amount of genetic change represented by a tree and are commonly compared with other measures of mutation to investigate the substitutional processes or the goodness of fit of a tree to the raw data. Up until now no universal tool has been available for calculating patristic distances and correlating them with other genetic distance measures. RESULTS: PATRISTICv1.0 is a java program that calculates patristic distances from large trees in a range of file formats and allows graphical and statistical interpretation of distance matrices calculated by other programs. CONCLUSION: The software overcomes some logistic barriers to analysing signals in sequences. In additional to calculating patristic distances, it provides plots for any combination of matrices, calculates commonly used statistics, allows data such as isolation dates to be entered and reorders matrices with matching species or gene labels. It will be used to analyse rates of mutation and substitutional saturation and the evolution of viruses. It is available at and requires the Java runtime environment

    Fidelity of Hyperbolic Space for Bayesian Phylogenetic Inference

    Full text link
    Bayesian inference for phylogenetics is a gold standard for computing distributions of phylogenies. It faces the challenging problem of. moving throughout the high-dimensional space of trees. However, hyperbolic space offers a low dimensional representation of tree-like data. In this paper, we embed genomic sequences into hyperbolic space and perform hyperbolic Markov Chain Monte Carlo for Bayesian inference. The posterior probability is computed by decoding a neighbour joining tree from proposed embedding locations. We empirically demonstrate the fidelity of this method on eight data sets. The sampled posterior distribution recovers the splits and branch lengths to a high degree. We investigated the effects of curvature and embedding dimension on the Markov Chain's performance. Finally, we discuss the prospects for adapting this method to navigate tree space with gradients

    Avian influenza virus exhibits distinct evolutionary dynamics in wild birds and poultry

    Get PDF
    BACKGROUND: Wild birds are the major reservoir hosts for influenza A viruses, occasionally transmitting to other species such as domesticated poultry. Despite an abundance of genomic data from avian influenza virus (AIV), little is known about whether AIV evolves differently in wild birds and poultry, although this is critical to revealing the dynamics and time-scale of viral evolution. In particular, because environmental (water-borne) transmission is more common in wild birds, which may reduce the number of replications per unit time, it is possible that evolutionary rates are systematically lower in wild birds than in poultry. RESULTS: We estimated rates of nucleotide substitution in two AIV subtypes that are strongly associated with infections in wild birds – H4 and H6 – and compared these to rates in the H5N1 subtype that has circulated in poultry for almost two decades. Our analyses of three internal genes confirm that H4 and H6 viruses are evolving significantly more slowly than H5N1 viruses, suggesting that evolutionary rates of AIV are reduced in wild birds. This result was verified by the analysis of a poultry-associated H6 lineage that exhibited a markedly higher substitution rate than those H6 viruses circulating in wild birds. Interestingly, we also observed a significant difference in evolutionary rate between H4 and H6, despite frequent reassortment rate among them. CONCLUSIONS: AIV experiences markedly different evolutionary dynamics between wild birds and poultry. These results suggest that rate heterogeneity among viral subtypes and ecological groupings should be taken into account when estimating evolutionary rates and divergence times. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12862-015-0410-5) contains supplementary material, which is available to authorized users

    19 Dubious Ways to Compute the Marginal Likelihood of a Phylogenetic Tree Topology

    Get PDF
    The marginal likelihood of a model is a key quantity for assessing the evidence provided by the data in support of a model. The marginal likelihood is the normalizing constant for the posterior density, obtained by integrating the product of the likelihood and the prior with respect to model parameters. Thus, the computational burden of computing the marginal likelihood scales with the dimension of the parameter space. In phylogenetics, where we work with tree topologies that are high-dimensional models, standard approaches to computing marginal likelihoods are very slow. Here, we study methods to quickly compute the marginal likelihood of a single fixed tree topology. We benchmark the speed and accuracy of 19 different methods to compute the marginal likelihood of phylogenetic topologies on a suite of real data sets under the JC69 model. These methods include several new ones that we develop explicitly to solve this problem, as well as existing algorithms that we apply to phylogenetic models for the first time. Altogether, our results show that the accuracy of these methods varies widely, and that accuracy does not necessarily correlate with computational burden. Our newly developed methods are orders of magnitude faster than standard approaches, and in some cases, their accuracy rivals the best established estimators

    Torchtree: flexible phylogenetic model development and inference using PyTorch

    Full text link
    Bayesian inference has predominantly relied on the Markov chain Monte Carlo (MCMC) algorithm for many years. However, MCMC is computationally laborious, especially for complex phylogenetic models of time trees. This bottleneck has led to the search for alternatives, such as variational Bayes, which can scale better to large datasets. In this paper, we introduce torchtree, a framework written in Python that allows developers to easily implement rich phylogenetic models and algorithms using a fixed tree topology. One can either use automatic differentiation, or leverage torchtree's plug-in system to compute gradients analytically for model components for which automatic differentiation is slow. We demonstrate that the torchtree variational inference framework performs similarly to BEAST in terms of speed and approximation accuracy. Furthermore, we explore the use of the forward KL divergence as an optimizing criterion for variational inference, which can handle discontinuous and non-differentiable models. Our experiments show that inference using the forward KL divergence tends to be faster per iteration compared to the evidence lower bound (ELBO) criterion, although the ELBO-based inference may converge faster in some cases. Overall, torchtree provides a flexible and efficient framework for phylogenetic model development and inference using PyTorch.Comment: 23 pages, 3 tables, and 4 figures in main text, plus supplementary material

    Automatic differentiation is no panacea for phylogenetic gradient computation

    Full text link
    Gradients of probabilistic model likelihoods with respect to their parameters are essential for modern computational statistics and machine learning. These calculations are readily available for arbitrary models via automatic differentiation implemented in general-purpose machine-learning libraries such as TensorFlow and PyTorch. Although these libraries are highly optimized, it is not clear if their general-purpose nature will limit their algorithmic complexity or implementation speed for the phylogenetic case compared to phylogenetics-specific code. In this paper, we compare six gradient implementations of the phylogenetic likelihood functions, in isolation and also as part of a variational inference procedure. We find that although automatic differentiation can scale approximately linearly in tree size, it is much slower than the carefully-implemented gradient calculation for tree likelihood and ratio transformation operations. We conclude that a mixed approach combining phylogenetic libraries with machine learning libraries will provide the optimal combination of speed and model flexibility moving forward.Comment: 15 pages and 2 figures in main text, plus supplementary material

    Escherichia coli ST8196 is a novel, locally evolved, and extensively drug resistant pathogenic lineage within the ST131 clonal complex

    Get PDF
    The H30Rx subclade of Escherichia coli ST131 is a clinically important, globally dispersed pathogenic lineage that typically displays resistance to fluoroquinolones and extended spectrum β-lactams. Isolates EC233 and EC234, variants of ST131-H30Rx with a novel sequence type (ST) 8196, isolated from unrelated patients presenting with bacteraemia at a Sydney Hospital in 2014 are characterised here. EC233 and EC234 are phylogroup B2, serotype O25:H4A, and resistant to ampicillin, amoxicillin, cefoxitin, ceftazidime, ceftriaxone, ciprofloxacin, norfloxacin and gentamicin and are likely clonal. Both harbour an IncFII_2 plasmid (pSPRC_Ec234-FII) that carries most of the resistance genes on an IS26 associated translocatable unit, two small plasmids and a novel IncI1 plasmid (pSPRC_Ec234-I). SNP-based phylogenetic analysis of the core genome of representatives within the ST131 clonal complex places both isolates in a subclade with three clinical Australian ST131-H30Rx clade-C isolates. A MrBayes phylogeny analysis of EC233 and EC234 indicates ST8196 share a most recent common ancestor with ST131-H30Rx strain EC70 isolated from the same hospital in 2013. Our study identified genomic hallmarks that define the ST131-H30Rx subclade in the ST8196 isolates and highlights a need for unbiased genomic surveillance approaches to identify novel high-risk MDR E. coli pathogens that impact healthcare facilities
    corecore