Search CORE

8 research outputs found

Alignment-free phylogeny of whole genomes using underlying subwords

Author: A Apostolico
A Apostolico
A Apostolico
A Apostolico
A Apostolico
A Apostolico
A Apostolico
A Apostolico
B Chor
C Iliopoulos
C Venter
D Critchlow
D Gusfield
D Wildman
Davide Verzotto
E Ukkonen
ES Martinsen
F Delsuc
GE Sims
GE Sims
GJD Smith
I Ulitsky
J Felsenstein
J Lin
J Thompson
JR Cole
M Comin
M Comin
M Comin
M Comin
M Huynen
Matteo Comin
R Giancarlo
SG Kong
T Kopelowitz
T Shiino
TH Cormen
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Comparing, ranking, and filtering motifs with character classes: Application to biological sequences analysis

Author: Comin M.
Verzotto D.
Publication venue: wiley
Publication date: 01/01/2014
Field of study

This chapter provides a characterization of motifs with character classes, following with the notion of motif priority for comparing and ranking different motifs together. The authors introduce the concept of underlying motifs for filtering any set of motifs with character classes into a new set that is linear in size with respect to a reference sequence. They present an algorithm to compute this new set exploiting the notions. Finally, they discuss some preliminary results on the identification of signals in protein sequences by means of underlying motifs. They have proved several theoretical results that support the validity of these fundamental properties. Most important, their motif priority rule along with the notion of underlying motifs has proved to be valuable for the analysis of biological sequences

Archivio istituzionale della ricerca - Università di Padova

Classification of Protein Sequences by means of Irredundant Patterns

Author: Comin Matteo
D. Verzotto
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

Abstract Background The classification of protein sequences using string algorithms provides valuable insights for protein function prediction. Several methods, based on a variety of different patterns, have been previously proposed. Almost all string-based approaches discover patterns that are not "independent, " and therefore the associated scores overcount, a multiple number of times, the contribution of patterns that cover the same region of a sequence. Results In this paper we use a class of patterns, called irredundant, that is specifically designed to address this issue. Loosely speaking the set of irredundant patterns is the smallest class of "independent" patterns that can describe all common patterns in two sequences, thus they avoid overcounting. We present a novel discriminative method, called Irredundant Class, based on the statistics of irredundant patterns combined with the power of support vector machines. Conclusion Tests on benchmark data show that Irredundant Class outperforms most of the string algorithms previously proposed, and it achieves results as good as current state-of-the-art methods. Moreover the footprints of the most discriminative irredundant patterns can be used to guide the identification of functional regions in protein sequences

Springer - Publisher Connector

PubMed Central

Archivio istituzionale della ricerca - Università di Padova

The IrredundantClass Method for Remote Homology Detection of Protein Sequences

Author: Comin Matteo
D. Verzotto
Publication venue: 'Mary Ann Liebert Inc'
Publication date: 01/01/2011
Field of study

The automatic classification of protein sequences into families is of great help for the functional prediction and annotation of new proteins. In this article, we present a method called Irredundant Class that address the remote homology detection problem. The best performing methods that solve this problem are string kernels, that compute a similarity function between pairs of proteins based on their subsequence composition. We provide evidence that almost all string kernels are based on patterns that are not independent, and therefore the associated similarity scores are obtained using a set of redundant features, overestimating the similarity between the proteins. To specifically address this issue, we introduce the class of irredundant common patterns. Loosely speaking, the set of irredundant common patterns is the smallest class of independent patterns that can describe all common patterns in a pair of sequences. We present a classification method based on the statistics of these patterns, named Irredundant Class. Results on benchmark data show that the Irredundant Class outperforms most of the string kernels previously proposed, and it achieves results as good as the current state-of-the-art method Local Alignment, but using the same pairwise information only once

Crossref

Archivio istituzionale della ricerca - Università di Padova

Understanding the microbial basis of body odor in pre-pubescent children and teenagers

Author: Brahma P
Hu P
Kong R
Lam T.H
Li J
Liu J
Liu P
Lu Y
Nagarajan N
Ng A.H.Q
Ong M
Schnell D
Swaile D
Tiesman J
Ton T.M.U
Verzotto D
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

10.1186/s40168-018-0588-zMicrobiome6121

Directory of Open Access Journals

ScholarBank@NUS

OPTIMA: sensitive and accurate whole-genome alignment of error-prone genomic maps by combinatorial indexing and technology-agnostic statistical analysis

Author: A Califano
A Valouev
ASM Teo
ASM Teo
D Sarkar
D Verzotto
D Verzotto
ET Lam
F Yao
G Ganapathy
H Lin
JD Storey
L Mendelowitz
M Antoniotti
M Ray
MD Muggli
MS Waterman
N Nagarajan
R Li
RM Karp
S Gao
TS Anantharaman
TS Anantharaman
TS Anantharaman
TS Anantharaman
Y Dong
Y Kawahara
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Theoretical and practical analyses in metagenomic sequence classification

Author: A Zielezinski
AB McIntyre
AS Teo
B Buchfink
C Quince
D Verzotto
DE Wood
DH Huson
DT Truong
F Breitwieser
F Garofalo
K Vervier
M Comin
M Comin
M Comin
R Ounit
R Ounit
SK Ames
TAK Freitas
TH Lam
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

Metagenomics is the study of genomic sequences in a heterogeneous microbial sample taken, e.g. from soil, water, human microbiome and skin. One of the primary objectives of metagenomic studies is to assign a taxonomic identity to each read sequenced from a sample and then to estimate the abundance of the known clades. With ever-increasing metagenomic datasets obtained from high-throughput sequencing technologies readily available nowadays, several fast and accurate methods have been developed that can work with reasonable computing requirements. Here we provide an overview of the state-of-the-art methods for the classification of metagenomic sequences, especially highlighting theoretical factors that seem to correlate well with practical factors, and could therefore be useful in the choice or development of a new method in experimental contexts. In particular, we emphasize that the information derived from the known genomes and eventually used in the learning and classification processes may create several experimental issues—mostly based on the amount of information used in the processes and its uniqueness, significance, and redundancy,—and some of these issues are intrinsic both in current alignment-based approaches and in compositional ones. This entails the need to develop efficient alignment-free methods that overcome such problems by combining the learning and classification processes in a single framework

Crossref

Archivio della Ricerca - Università di Pisa

Fast and efficient Rmap assembly using the Bi-labelled de Bruijn graph

Author: A Bankevich
A Gurevich
A Valouev
A Valouev
B Teague
D Verzotto
DC Schwartz
DM Church
DR Zerbino
G Ganapathy
G Miclotte
JL Bentley
JM Shelton
JT Simpson
K Mukherjee
K Mukherjee
L Li
LM Mendelowitz
MD Muggli
MD Muggli
P Chen
P Medvedev
PA Pevzner
RM Idury
S Chamala
S Reslewic
S Zhou
S Zhou
S Zhou
S Zhou
TS Anantharaman
W Pan
W Pan
Y Dong
Y Peng
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref