Search CORE

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

CONICET Digital

Servicio de Difusión de la Creación Intelectual

The Francis Crick Institute

Occurrence of immature forms of culicids (Insecta: Diptera) in the northeastern region of Brazil

Author: Almirón WR
Almirón WR
Alves LC
Araújo MS
Ayres M
Branco AS
Carvalho GA
Carvalho GA
Consoli RAGB
Dibo MR
Fischer S
Fischer S
Fontes G
Forattini OP
Forattini OP
Forattini OP
Forattini OP
Forattini OP
Gomes AC
Guedes MLP
Lira-Vieira AR
Medeiros Z
Montes J
Nunes TC
Pinto DM
Ramos RAN
Reinert JF
Rezende HR
Silva RC
Taipe-Lagos CB
Publication venue: 'FapUNIFESP (SciELO)'
Publication date
Field of study

MISHIMA - a new method for high speed multiple alignment of nucleotide sequences of bacterial genome scale data

Author: ACC Shih
AL Delcher
B Morgenstern
C Notredame
D Mikhailov
DF Feng
DG Higgins
DJ Lipman
F Corpet
GJ Barton
J Cheetham
J Stoye
JD Thompson
K Katoh
K Kryukov
K Reinert
KB Li
Kirill Kryukov
M Brudno
M Brudno
M Brudno
M Kimura
N Bray
Naruya Saitou
O Gotoh
RC Edgar
U Tonges
WR Taylor
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Large nucleotide sequence datasets are becoming increasingly common objects of comparison. Complete bacterial genomes are reported almost everyday. This creates challenges for developing new multiple sequence alignment methods. Conventional multiple alignment methods are based on pairwise alignment and/or progressive alignment techniques. These approaches have performance problems when the number of sequences is large and when dealing with genome scale sequences. Results We present a new method of multiple sequence alignment, called MISHIMA (Method for Inferring Sequence History In terms of Multiple Alignment), that does not depend on pairwise sequence comparison. A new algorithm is used to quickly find rare oligonucleotide sequences shared by all sequences. Divide and conquer approach is then applied to break the sequences into fragments that can be aligned independently by an external alignment program. These partial alignments are assembled together to form a complete alignment of the original sequences. Conclusions MISHIMA provides improved performance compared to the commonly used multiple alignment methods. As an example, six complete genome sequences of bacteria species <it>Helicobacter pylori </it>(about 1.7 Mb each) were successfully aligned in about 6 hours using a single PC.</p

Institutional Repository of the Freie Universität Berlin

STELLAR: fast and exact local alignments

Author: A Döring
A Gogol-Döring
A Marzal
A Mortazavi
AE Darling
AN Arslan
B Langmead
B Paten
B Raphael
Birte Kehr
D Weese
David Weese
H Jiang
H Li
H Li
I Dubchak
Knut Reinert
KR Rasmussen
M Blanchette
MS Waterman
P Jokinen
PH Sellers
R Li
S Burkhardt
S Karlin
S Rumble
S Schwartz
S Tweedie
SF Altschul
SF Altschul
TF Smith
TW Lam
WJ Kent
WR Pearson
Z Zhang
Z Zhang
Z Zhang
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Large-scale comparison of genomic sequences requires reliable tools for the search of local alignments. Practical local aligners are in general fast, but heuristic, and hence sometimes miss significant matches. Results We present here the local pairwise aligner STELLAR that has full sensitivity for <it>ε</it>-alignments, i.e. guarantees to report all local alignments of a given minimal length and maximal error rate. The aligner is composed of two steps, filtering and verification. We apply the SWIFT algorithm for lossless filtering, and have developed a new verification strategy that we prove to be exact. Our results on simulated and real genomic data confirm and quantify the conjecture that heuristic tools like BLAST or BLAT miss a large percentage of significant local alignments. Conclusions STELLAR is very practical and fast on very long sequences which makes it a suitable new tool for finding local alignments between genomic sequences under the edit distance model. Binaries are freely available for Linux, Windows, and Mac OS X at <url>http://www.seqan.de/projects/stellar</url>. The source code is freely distributed with the SeqAn C++ library version 1.3 and later at <url>http://www.seqan.de</url>.</p

Repository: Freie Universität Berlin (FU), Math Department (fu_mi_publications)

SeqAn An efficient, generic C++ library for sequence analysis

Author: A Darling
A Fabri
A Halpern
Andreas Döring
C Notredame
D Butt
D Vandevoorde
David Weese
DS Hirschberg
EW Myers
EW Myers
G Myers
G Navarro
J Dutheil
J Kececioglu
J Stajich
JC Venter
K Czarnecki
K Mehlhorn
Knut Reinert
M Abouelhoda
M Abouelhoda
M Brudno
M Höhl
M Li
M Pocock
M Wilson
MH Austern
MH Overmars
MI Abouelhoda
N Saitou
O Gotoh
P Bieganski
P Weiner
R Giegerich
RJ Mural
S Burkhardt
S Burkhardt
S Kurtz
SB Needleman
SF Altschul
TH Cormen
Tobias Rausch
U Manber
W Vahrson
WR Pitt
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background The use of novel algorithmic techniques is pivotal to many important problems in life science. For example the sequencing of the human genome <abbrgrp><abbr bid="B1">1</abbr></abbrgrp> would not have been possible without advanced assembly algorithms. However, owing to the high speed of technological progress and the urgent need for bioinformatics tools, there is a widening gap between state-of-the-art algorithmic techniques and the actual algorithmic components of tools that are in widespread use. Results To remedy this trend we propose the use of SeqAn, a library of efficient data types and algorithms for sequence analysis in computational biology. SeqAn comprises implementations of existing, practical state-of-the-art algorithmic components to provide a sound basis for algorithm testing and development. In this paper we describe the design and content of SeqAn and demonstrate its use by giving two examples. In the first example we show an application of SeqAn as an experimental platform by comparing different exact string matching algorithms. The second example is a simple version of the well-known MUMmer tool rewritten in SeqAn. Results indicate that our implementation is very efficient and versatile to use. Conclusion We anticipate that SeqAn greatly simplifies the rapid development of new bioinformatics tools by providing a collection of readily usable, well-designed algorithmic components which are fundamental for the field of sequence analysis. This leverages not only the implementation of new algorithms, but also enables a sound analysis and comparison of existing algorithms.</p

Repository: Freie Universität Berlin (FU), Math Department (fu_mi_publications)

An analysis of single amino acid repeats as use case for application specific background models

Author: C Notredame
David P Kreil
DP Depledge
DP Kreil
E Birney
E Delot
EL Sonnhammer
EM Marcotte
G Gouridis
G Nuel
G Reinert
H Gerber
H Nielsen
H Nielsen
IB Kuznetsov
J Thompson
J Wootton
J Xie
JD Bendtsen
JM Hancock
JW Fondon
L Brown
L Zhang
M Hoebeke
M Mar Alba
M Thomas-Chollier
M Tipping
M Tipping
MA Huntley
O Weiss
OB Ptitsyn
P Siwach
P Siwach
Paweł P Łabaj
Peter Sykacek
PP Łabaj
R Lopez
R Lyne
RI Sadreyev
RS Hegde
S Caburet
S Hands
S Henikoff
S Karlin
S Karlin
SF Altschul
SF Altschul
SF Altschul
T Koestler
VJ Promponas
VR Chechetkin
VS Pande
WR Pearson
Y Kashi
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Background Sequence analysis aims to identify biologically relevant signals against a backdrop of functionally meaningless variation. Increasingly, it is recognized that the quality of the background model directly affects the performance of analyses. State-of-the-art approaches rely on classical sequence models that are adapted to the studied dataset. Although performing well in the analysis of globular protein domains, these models break down in regions of stronger compositional bias or low complexity. While these regions are typically filtered, there is increasing anecdotal evidence of functional roles. This motivates an exploration of more complex sequence models and application-specific approaches for the investigation of biased regions. Results Traditional Markov-chains and application-specific regression models are compared using the example of predicting runs of single amino acids, a particularly simple class of biased regions. Cross-fold validation experiments reveal that the alternative regression models capture the multi-variate trends well, despite their low dimensionality and in contrast even to higher-order Markov-predictors. We show how the significance of unusual observations can be computed for such empirical models. The power of a dedicated model in the detection of biologically interesting signals is then demonstrated in an analysis identifying the unexpected enrichment of contiguous leucine-repeats in signal-peptides. Considering different reference sets, we show how the question examined actually defines what constitutes the 'background'. Results can thus be highly sensitive to the choice of appropriate model training sets. Conversely, the choice of reference data determines the questions that can be investigated in an analysis. Conclusions Using a specific case of studying biased regions as an example, we have demonstrated that the construction of application-specific background models is both necessary and feasible in a challenging sequence analysis situation