Search CORE

48 research outputs found

Fast Algorithms for Large-Scale Phylogenetic Reconstruction

Author: Truszkowski Jakub
Publication venue: 'University of Waterloo'
Publication date: 01/01/2013
Field of study

One of the most fundamental computational problems in biology is that of inferring evolutionary histories of groups of species from sequence data. Such evolutionary histories, known as phylogenies are usually represented as binary trees where leaves represent extant species, whereas internal nodes represent their shared ancestors. As the amount of sequence data available to biologists increases, very fast phylogenetic reconstruction algorithms are becoming necessary. Currently, large sequence alignments can contain up to hundreds of thousands of sequences, making traditional methods, such as Neighbor Joining, computationally prohibitive. To address this problem, we have developed three novel fast phylogenetic algorithms. The first algorithm, QTree, is a quartet-based heuristic that runs in O(n log n) time. It is based on a theoretical algorithm that reconstructs the correct tree, with high probability, assuming every quartet is inferred correctly with constant probability. The core of our algorithm is a balanced search tree structure that enables us to locate an edge in the tree in O(log n) time. Our algorithm is several times faster than all the current methods, while its accuracy approaches that of Neighbour Joining. The second algorithm, LSHTree, is the first sub-quadratic time algorithm with theoretical performance guarantees under a Markov model of sequence evolution. Our new algorithm runs in O(n^{1+γ(g)} log^2 n) time, where γ is an increasing function of an upper bound on the mutation rate along any branch in the phylogeny, and γ(g) < 1 for all g. For phylogenies with very short branches, the running time of our algorithm is close to linear. In experiments, our prototype implementation was more accurate than the current fast algorithms, while being comparably fast. In the final part of this thesis, we apply the algorithmic framework behind LSHTree to the problem of placing large numbers of short sequence reads onto a fixed phylogenetic tree. Our initial results in this area are promising, but there are still many challenges to be resolved

University of Waterloo's Institutional Repository

Rapidly Computing the Phylogenetic Transfer Index

Author: Gascuel Olivier
Swenson Krister M.
Truszkowski Jakub
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 19th International Workshop on Algorithms in Bioinformatics (WABI 2019)
Publication date: 01/01/2019
Field of study

Given trees T and T_o on the same taxon set, the transfer index phi(b,T_o) is the number of taxa that need to be ignored so that the bipartition induced by branch b in T is equal to some bipartition in T_o. Recently, Lemoine et al. [Lemoine et al., 2018] used the transfer index to design a novel bootstrap analysis technique that improves on Felsenstein\u27s bootstrap on large, noisy data sets. In this work, we propose an algorithm that computes the transfer index for all branches b in T in O(n log^3 n) time, which improves upon the current O(n^2)-time algorithm by Lin, Rajan and Moret [Lin et al., 2012]. Our implementation is able to process pairs of trees with hundreds of thousands of taxa in minutes and considerably speeds up the method of Lemoine et al. on large data sets. We believe our algorithm can be useful for comparing large phylogenies, especially when some taxa are misplaced (e.g. due to horizontal gene transfer, recombination, or reconstruction errors)

HAL Descartes

DROPS Dagstuhl Research Online Publication Server

Portail HAL Um (Université de Montpellier)

HAL: Hyper Article en Ligne

HAL-Pasteur

Hal-Diderot

New decoding algorithms for Hidden Markov Models using distance measures on labellings

Author: A Krogh
B Brejová
Daniel G Brown
ELL Sonnhammer
GE Tusnady
Jakub Truszkowski
L Käll
L Käll
M Stanke
P Fariselli
R Durbin
SL Cawley
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Existing hidden Markov model decoding algorithms do not focus on approximately identifying the sequence feature boundaries. Results We give a set of algorithms to compute the conditional probability of all labellings "near" a reference labelling <it>λ </it>for a sequence <it>y </it>for a variety of definitions of "near". In addition, we give optimization algorithms to find the best labelling for a sequence in the robust sense of having all of its feature boundaries nearly correct. Natural problems in this domain are <it>NP</it>-hard to optimize. For membrane proteins, our algorithms find the approximate topology of such proteins with comparable success to existing programs, while being substantially more accurate in estimating the positions of transmembrane helix boundaries. Conclusion More robust HMM decoding may allow for better analysis of sequence features, in reasonable runtimes.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

A machine learning approach to estimating the geographical origin of timber

Author: Truszkowski Jakub
Publication venue: 'Center for Open Science'
Publication date
Field of study

Ezid

Maximum Likelihood Phylogenetic Inference is Consistent on Multiple Sequence Alignments, with or without Gaps

Author: Jakub Truszkowski
Nick Goldman
Publication venue: Oxford University Press (OUP)
Publication date: 28/11/2015
Field of study

We prove that maximum likelihood phylogenetic inference is consistent on gapped multiple sequence alignments (MSAs) as long as substitution rates across each edge are greater than zero, under mild assumptions on the structure of the alignment. Under these assumptions, maximum likelihood will asymptotically recover the tree with edge lengths corresponding to the mean number of substitutions per site on each edge. This refutes Warnow's recent suggestion (Warnow 2012) that maximum likelihood phylogenetic inference might be statistically inconsistent when gaps are treated as missing data, even if the MSA is correct. We also derive a simple new proof of maximum likelihood consistency of ungapped alignments

Crossref

PubMed Central

Towards a Practical O(n logn) Phylogeny Algorithm

Author: Daniel G. Brown
Jakub Truszkowski
Publication venue: Springer Berlin Heidelberg
Publication date: 01/01/2011
Field of study

Crossref