Search CORE

98 research outputs found

A surrogate function for one-dimensional phylogenetic likelihoods

Author: Brian C Claywell
Connor O McCoy
Frederick A Matsen IV
Mathieu Fourment
Vu Dinh
Publication venue
Publication date: 02/06/2017
Field of study

Phylogenetics has seen an steady increase in substitution model complexity, which requires increasing amounts of computational power to compute likelihoods. This model complexity motivates strategies to approximate the likelihood functions for branch length optimization and Bayesian sampling. In this paper, we develop an approximation to the one-dimensional likelihood function as parametrized by a single branch length. This new method uses a four-parameter surrogate function abstracted from the simplest phylogenetic likelihood function, the binary symmetric model. We show that it offers a surrogate that can be fit over a variety of branch lengths, that it is applicable to a wide variety of models and trees, and that it can be used effectively as a proposal mechanism for Bayesian sampling. The method is implemented as a stand-alone open-source C library for calling from phylogenetics algorithms; it has proven essential for good performance of our online phylogenetic algorithm sts

arXiv.org e-Print Archive

Crossref

PATRISTIC: a program for calculating patristic distances and graphically comparing the components of genetic change

Author: Fourment Mathieu
Gibbs Mark J
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: Phylogenies are commonly used to analyse the differences between genes, genomes and species. Patristic distances calculated from tree branch lengths describe the amount of genetic change represented by a tree and are commonly compared with other measures of mutation to investigate the substitutional processes or the goodness of fit of a tree to the raw data. Up until now no universal tool has been available for calculating patristic distances and correlating them with other genetic distance measures. RESULTS: PATRISTICv1.0 is a java program that calculates patristic distances from large trees in a range of file formats and allows graphical and statistical interpretation of distance matrices calculated by other programs. CONCLUSION: The software overcomes some logistic barriers to analysing signals in sequences. In additional to calculating patristic distances, it provides plots for any combination of matrices, calculates commonly used statistics, allows data such as isolation dates to be entered and reorders matrices with matching species or gene labels. It will be used to analyse rates of mutation and substitutional saturation and the evolution of viruses. It is available at and requires the Java runtime environment

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

The Australian National University

Fidelity of Hyperbolic Space for Bayesian Phylogenetic Inference

Author: Darling Aaron E.
Fourment Mathieu
Macaulay Matthew
Publication venue
Publication date: 16/06/2022
Field of study

Bayesian inference for phylogenetics is a gold standard for computing distributions of phylogenies. It faces the challenging problem of. moving throughout the high-dimensional space of trees. However, hyperbolic space offers a low dimensional representation of tree-like data. In this paper, we embed genomic sequences into hyperbolic space and perform hyperbolic Markov Chain Monte Carlo for Bayesian inference. The posterior probability is computed by decoding a neighbour joining tree from proposed embedding locations. We empirically demonstrate the fidelity of this method on eight data sets. The sampled posterior distribution recovers the splits and branch lengths to a high degree. We investigated the effects of curvature and embedding dimension on the Markov Chain's performance. Finally, we discuss the prospects for adapting this method to navigate tree space with gradients

arXiv.org e-Print Archive

Avian influenza virus exhibits distinct evolutionary dynamics in wild birds and poultry

Author: Edward C Holmes
Mathieu Fourment
Publication venue: Springer Science and Business Media LLC
Publication date: 01/01/2015
Field of study

BACKGROUND: Wild birds are the major reservoir hosts for influenza A viruses, occasionally transmitting to other species such as domesticated poultry. Despite an abundance of genomic data from avian influenza virus (AIV), little is known about whether AIV evolves differently in wild birds and poultry, although this is critical to revealing the dynamics and time-scale of viral evolution. In particular, because environmental (water-borne) transmission is more common in wild birds, which may reduce the number of replications per unit time, it is possible that evolutionary rates are systematically lower in wild birds than in poultry. RESULTS: We estimated rates of nucleotide substitution in two AIV subtypes that are strongly associated with infections in wild birds – H4 and H6 – and compared these to rates in the H5N1 subtype that has circulated in poultry for almost two decades. Our analyses of three internal genes confirm that H4 and H6 viruses are evolving significantly more slowly than H5N1 viruses, suggesting that evolutionary rates of AIV are reduced in wild birds. This result was verified by the analysis of a poultry-associated H6 lineage that exhibited a markedly higher substitution rate than those H6 viruses circulating in wild birds. Interestingly, we also observed a significant difference in evolutionary rate between H4 and H6, despite frequent reassortment rate among them. CONCLUSIONS: AIV experiences markedly different evolutionary dynamics between wild birds and poultry. These results suggest that rate heterogeneity among viral subtypes and ecological groupings should be taken into account when estimating evolutionary rates and divergence times. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12862-015-0410-5) contains supplementary material, which is available to authorized users

Crossref

Springer - Publisher Connector

PubMed Central

19 Dubious Ways to Compute the Marginal Likelihood of a Phylogenetic Tree Topology

Author: Bilge Arman
Fourment Mathieu
Magee Andrew F
Matsen Frederick A
Minin Vladimir N
Whidden Chris
Publication venue: eScholarship, University of California
Publication date: 01/03/2020
Field of study

The marginal likelihood of a model is a key quantity for assessing the evidence provided by the data in support of a model. The marginal likelihood is the normalizing constant for the posterior density, obtained by integrating the product of the likelihood and the prior with respect to model parameters. Thus, the computational burden of computing the marginal likelihood scales with the dimension of the parameter space. In phylogenetics, where we work with tree topologies that are high-dimensional models, standard approaches to computing marginal likelihoods are very slow. Here, we study methods to quickly compute the marginal likelihood of a single fixed tree topology. We benchmark the speed and accuracy of 19 different methods to compute the marginal likelihood of phylogenetic topologies on a suite of real data sets under the JC69 model. These methods include several new ones that we develop explicitly to solve this problem, as well as existing algorithms that we apply to phylogenetic models for the first time. Altogether, our results show that the accuracy of these methods varies widely, and that accuracy does not necessarily correlate with computational burden. Our newly developed methods are orders of magnitude faster than standard approaches, and in some cases, their accuracy rivals the best established estimators

OPUS - University of Technology Sydney

eScholarship - University of California

Torchtree: flexible phylogenetic model development and inference using PyTorch

Author: Fourment Mathieu
Ji Xiang
Macaulay Matthew
Matsen IV Frederick A
Suchard Marc A
Swanepoel Christiaan J
Publication venue
Publication date: 25/06/2024
Field of study

Bayesian inference has predominantly relied on the Markov chain Monte Carlo (MCMC) algorithm for many years. However, MCMC is computationally laborious, especially for complex phylogenetic models of time trees. This bottleneck has led to the search for alternatives, such as variational Bayes, which can scale better to large datasets. In this paper, we introduce torchtree, a framework written in Python that allows developers to easily implement rich phylogenetic models and algorithms using a fixed tree topology. One can either use automatic differentiation, or leverage torchtree's plug-in system to compute gradients analytically for model components for which automatic differentiation is slow. We demonstrate that the torchtree variational inference framework performs similarly to BEAST in terms of speed and approximation accuracy. Furthermore, we explore the use of the forward KL divergence as an optimizing criterion for variational inference, which can handle discontinuous and non-differentiable models. Our experiments show that inference using the forward KL divergence tends to be faster per iteration compared to the evidence lower bound (ELBO) criterion, although the ELBO-based inference may converge faster in some cases. Overall, torchtree provides a flexible and efficient framework for phylogenetic model development and inference using PyTorch.Comment: 23 pages, 3 tables, and 4 figures in main text, plus supplementary material

arXiv.org e-Print Archive

Automatic differentiation is no panacea for phylogenetic gradient computation

Author: Fourment Mathieu
Galloway Jared G.
Gangavarapu Karthik
Ji Xiang
Matsen IV Frederick A.
Suchard Marc A.
Swanepoel Christiaan J.
Publication venue
Publication date: 03/11/2022
Field of study

Gradients of probabilistic model likelihoods with respect to their parameters are essential for modern computational statistics and machine learning. These calculations are readily available for arbitrary models via automatic differentiation implemented in general-purpose machine-learning libraries such as TensorFlow and PyTorch. Although these libraries are highly optimized, it is not clear if their general-purpose nature will limit their algorithmic complexity or implementation speed for the phylogenetic case compared to phylogenetics-specific code. In this paper, we compare six gradient implementations of the phylogenetic likelihood functions, in isolation and also as part of a variational inference procedure. We find that although automatic differentiation can scale approximately linearly in tree size, it is much slower than the carefully-implemented gradient calculation for tree likelihood and ratio transformation operations. We conclude that a mixed approach combining phylogenetic libraries with machine learning libraries will provide the optimal combination of speed and model flexibility moving forward.Comment: 15 pages and 2 figures in main text, plus supplementary material

arXiv.org e-Print Archive

Crossref

eScholarship - University of California

Escherichia coli ST8196 is a novel, locally evolved, and extensively drug resistant pathogenic lineage within the ST131 clonal complex

Author: Cheong Elaine
Chowdhury Piklu Roy
Darling Aaron E
Djordjevic Steven P
Fourment Mathieu
Gottlieb Thomas
Hastak Priyanka
Merlino John
Myers Garry SA
Publication venue: Taylor & Francis
Publication date: 01/01/2020
Field of study

The H30Rx subclade of Escherichia coli ST131 is a clinically important, globally dispersed pathogenic lineage that typically displays resistance to fluoroquinolones and extended spectrum β-lactams. Isolates EC233 and EC234, variants of ST131-H30Rx with a novel sequence type (ST) 8196, isolated from unrelated patients presenting with bacteraemia at a Sydney Hospital in 2014 are characterised here. EC233 and EC234 are phylogroup B2, serotype O25:H4A, and resistant to ampicillin, amoxicillin, cefoxitin, ceftazidime, ceftriaxone, ciprofloxacin, norfloxacin and gentamicin and are likely clonal. Both harbour an IncFII_2 plasmid (pSPRC_Ec234-FII) that carries most of the resistance genes on an IS26 associated translocatable unit, two small plasmids and a novel IncI1 plasmid (pSPRC_Ec234-I). SNP-based phylogenetic analysis of the core genome of representatives within the ST131 clonal complex places both isolates in a subclade with three clinical Australian ST131-H30Rx clade-C isolates. A MrBayes phylogeny analysis of EC233 and EC234 indicates ST8196 share a most recent common ancestor with ST131-H30Rx strain EC70 isolated from the same hospital in 2013. Our study identified genomic hallmarks that define the ST131-H30Rx subclade in the ST8196 isolates and highlights a need for unbiased genomic surveillance approaches to identify novel high-risk MDR E. coli pathogens that impact healthcare facilities

UNSWorks