Search CORE

219 research outputs found

XenDB: Full length cDNA prediction and cross species mapping in Xenopus laevis

Author: Altmann Curtis R
Beckstette Michael
Brivanlou Ali H
Giegerich Robert
Sczyrba Alexander
Publication venue: BioMed Central
Publication date: 01/01/2005
Field of study

BACKGROUND: Research using the model system Xenopus laevis has provided critical insights into the mechanisms of early vertebrate development and cell biology. Large scale sequencing efforts have provided an increasingly important resource for researchers. To provide full advantage of the available sequence, we have analyzed 350,468 Xenopus laevis Expressed Sequence Tags (ESTs) both to identify full length protein encoding sequences and to develop a unique database system to support comparative approaches between X. laevis and other model systems. DESCRIPTION: Using a suffix array based clustering approach, we have identified 25,971 clusters and 40,877 singleton sequences. Generation of a consensus sequence for each cluster resulted in 31,353 tentative contig and 4,801 singleton sequences. Using both BLASTX and FASTY comparison to five model organisms and the NR protein database, more than 15,000 sequences are predicted to encode full length proteins and these have been matched to publicly available IMAGE clones when available. Each sequence has been compared to the KOG database and ~67% of the sequences have been assigned a putative functional category. Based on sequence homology to mouse and human, putative GO annotations have been determined. CONCLUSION: The results of the analysis have been stored in a publicly available database XenDB . A unique capability of the database is the ability to batch upload cross species queries to identify potential Xenopus homologues and their associated full length clones. Examples are provided including mapping of microarray results and application of 'in silico' analysis. The ability to quickly translate the results of various species into 'Xenopus-centric' information should greatly enhance comparative embryological approaches. Supplementary material can be found at

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Publications at Bielefeld University

Structator: fast index-based search for RNA sequence-structure patterns

Author: Backofen Rolf
Beckstette Michael
Kurtz Stefan
Meyer Fernando
Will Sebastian
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/12/2010
Field of study

Background The secondary structure of RNA molecules is intimately related to their function and often more conserved than the sequence. Hence, the important task of searching databases for RNAs requires to match sequence-structure patterns. Unfortunately, current tools for this task have, in the best case, a running time that is only linear in the size of sequence databases. Furthermore, established index data structures for fast sequence matching, like suffix trees or arrays, cannot benefit from the complementarity constraints introduced by the secondary structure of RNAs. Results We present a novel method and readily applicable software for time efficient matching of RNA sequence-structure patterns in sequence databases. Our approach is based on affix arrays, a recently introduced index data structure, preprocessed from the target database. Affix arrays support bidirectional pattern search, which is required for efficiently handling the structural constraints of the pattern. Structural patterns like stem-loops can be matched inside out, such that the loop region is matched first and then the pairing bases on the boundaries are matched consecutively. This allows to exploit base pairing information for search space reduction and leads to an expected running time that is sublinear in the size of the sequence database. The incorporation of a new chaining approach in the search of RNA sequence-structure patterns enables the description of molecules folding into complex secondary structures with multiple ordered patterns. The chaining approach removes spurious matches from the set of intermediate results, in particular of patterns with little specificity. In benchmark experiments on the Rfam database, our method runs up to two orders of magnitude faster than previous methods. Conclusions The presented method's sublinear expected running time makes it well suited for RNA sequence-structure pattern matching in large sequence databases. RNA molecules containing several stem-loop substructures can be described by multiple sequence-structure patterns and their matches are efficiently handled by a novel chaining method. Beyond our algorithmic contributions, we provide with Structator a complete and robust open-source software solution for index-based search of RNA sequence-structure patterns. The Structator software is available at http://www.zbh.uni-hamburg.de/Structator webcite.Deutsche Forschungsgemeinschaft (grant WI 3628/1-1

DSpace@MIT

Crossref

Springer - Publisher Connector

FreiDok plus

PubMed Central

Publications at Bielefeld University

Significant speedup of database searches with HMMs by search space reduction with PSSM family models

Author: Beckstette Michael
Giegerich Robert
Homann Robert
Kurtz Stefan
Publication venue: Oxford University Press
Publication date: 01/01/2009
Field of study

Motivation: Profile hidden Markov models (pHMMs) are currently the most popular modeling concept for protein families. They provide sensitive family descriptors, and sequence database searching with pHMMs has become a standard task in today's genome annotation pipelines. On the downside, searching with pHMMs is computationally expensive

CiteSeerX

PubMed Central

Publications at Bielefeld University

Leitfaden Vernetzung und Kooperation für Initiativen zur Förderung der Familienbildung

Author: Beckstette Wiebke
Bierschock Kurt P.
Rupp Marina
Publication venue: Universitatsbibliothek Bamberg
Publication date: 01/01/2002
Field of study

SSOAR - Social Science Open Access Repository

MOODS: fast search for position weight matrix matches in DNA sequences

Author: Beckstette
Brown
C. Pizzi
E. Ukkonen
J. Korhonen
Lenhard
Matys
P. Martinmaki
P. Rastas
Staden
Stormo
Wu
Publication venue: Oxford University Press
Publication date: 01/01/2009
Field of study

Summary: MOODS (MOtif Occurrence Detection Suite) is a software package for matching position weight matrices against DNA sequences. MOODS implements state-of-the-art online matching algorithms, achieving considerably faster scanning speed than with a simple brute-force search. MOODS is written in C++, with bindings for the popular BioPerl and Biopython toolkits. It can easily be adapted for different purposes and integrated into existing workflows. It can also be used as a C++ library

Crossref

PubMed Central

Archivio istituzionale della ricerca - Università di Padova

Fast index based algorithms and software for matching position specific scoring matrices

Author: A Kel
A Sandelin
B Dorohonceanu
D Weeks
G Castillo
H Gonnet
J Henikoff
J Henikoff
J Kärkkäinen
K Quandt
L Goldstein
LR Murphy
M Abouelhoda
M Beckstette
M Beckstette
M Gribskov
Michael Beckstette
N de Bruijn
N Hulo
P Embrechts
P Haverty
P Scordis
R Giegerich
R Staden
R Tatusov
Robert Giegerich
Robert Homann
S Kurtz
S Kurtz
S Rahmann
S Rajasekaran
Stefan Kurtz
T Kasai
T Li
T Wu
T Wu
TK Attwood
V Freschi
V Matys
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: In biological sequence analysis, position specific scoring matrices (PSSMs) are widely used to represent sequence motifs in nucleotide as well as amino acid sequences. Searching with PSSMs in complete genomes or large sequence databases is a common, but computationally expensive task. RESULTS: We present a new non-heuristic algorithm, called ESAsearch, to efficiently find matches of PSSMs in large databases. Our approach preprocesses the search space, e.g., a complete genome or a set of protein sequences, and builds an enhanced suffix array that is stored on file. This allows the searching of a database with a PSSM in sublinear expected time. Since ESAsearch benefits from small alphabets, we present a variant operating on sequences recoded according to a reduced alphabet. We also address the problem of non-comparable PSSM-scores by developing a method which allows the efficient computation of a matrix similarity threshold for a PSSM, given an E-value or a p-value. Our method is based on dynamic programming and, in contrast to other methods, it employs lazy evaluation of the dynamic programming matrix. We evaluated algorithm ESAsearch with nucleotide PSSMs and with amino acid PSSMs. Compared to the best previous methods, ESAsearch shows speedups of a factor between 17 and 275 for nucleotide PSSMs, and speedups up to factor 1.8 for amino acid PSSMs. Comparisons with the most widely used programs even show speedups by a factor of at least 3.8. Alphabet reduction yields an additional speedup factor of 2 on amino acid sequences compared to results achieved with the 20 symbol standard alphabet. The lazy evaluation method is also much faster than previous methods, with speedups of a factor between 3 and 330. CONCLUSION: Our analysis of ESAsearch reveals sublinear runtime in the expected case, and linear runtime in the worst case for sequences not shorter than | [Formula: see text] |(m )+ m - 1, where m is the length of the PSSM and [Formula: see text] a finite alphabet. In practice, ESAsearch shows superior performance over the most widely used programs, especially for DNA sequences. The new algorithm for accurate on-the-fly calculations of thresholds has the potential to replace formerly used approximation approaches. Beyond the algorithmic contributions, we provide a robust, well documented, and easy to use software package, implementing the ideas and algorithms presented in this manuscript

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Publications at Bielefeld University

CoryneRegNet 6.0—Updated database content, new analysis methods and novel features focusing on community demands

Author: A. Tauch
Baumbach
Baumbach
Baumbach
Baumbach
Baumbach
Beckstette
Cerdeira
J. Baumbach
J. Pauling
Munch
Neuweger
R. Rottger
Ruiz
Salgado
Salgado
Salgado
Salgado
Schroder
V. Azevedo
Wittkop
Publication venue: Oxford University Press
Publication date: 01/01/2012
Field of study

Post-genomic analysis techniques such as next-generation sequencing have produced vast amounts of data about micro organisms including genetic sequences, their functional annotations and gene regulatory interactions. The latter are genetic mechanisms that control a cell's characteristics, for instance, pathogenicity as well as survival and reproduction strategies. CoryneRegNet is the reference database and analysis platform for corynebacterial gene regulatory networks. In this article we introduce the updated version 6.0 of CoryneRegNet and describe the updated database content which includes, 6352 corynebacterial regulatory interactions compared with 4928 interactions in release 5.0 and 3235 regulations in release 4.0, respectively. We also demonstrate how we support the community by integrating analysis and visualization features for transiently imported custom data, such as gene regulatory interactions. Furthermore, with release 6.0, we provide easy-to-use functions that allow the user to submit data for persistent storage with the CoryneRegNet database. Thus, it offers important options to its users in terms of community demands. CoryneRegNet is publicly available at http://www.coryneregnet.de

Crossref

PubMed Central

Publications at Bielefeld University

MPG.PuRe

Recommended from our members

Impact of process temperature and organic loading rate on cellulolytic/hydrolytic biofilm microbiomes during biomethanation of ryegrass silage revealed by genome-centered metagenomics and metatranscriptomics

Author: Beckstette Michael
Blom Jochen
Derenkó Jaqueline
Henke Christian
Jost Carsten
Klocke Michael
Maus Irena
Pühler Alfred
Rademacher Antje
Rumming Madis
Schlüter Andreas
Sczyrba Alexander
Stolze Yvonne
Wibberg Daniel
Willenbücher Katharina
Publication venue: London : BioMed Central
Publication date: 01/01/2020
Field of study

Background: Anaerobic digestion (AD) of protein-rich grass silage was performed in experimental two-stage two-phase biogas reactor systems at low vs. increased organic loading rates (OLRs) under mesophilic (37 °C) and thermophilic (55 °C) temperatures. To follow the adaptive response of the biomass-attached cellulolytic/hydrolytic biofilms at increasing ammonium/ammonia contents, genome-centered metagenomics and transcriptional profiling based on metagenome assembled genomes (MAGs) were conducted. Results: In total, 78 bacterial and archaeal MAGs representing the most abundant members of the communities, and featuring defined quality criteria were selected and characterized in detail. Determination of MAG abundances under the tested conditions by mapping of the obtained metagenome sequence reads to the MAGs revealed that MAG abundance profiles were mainly shaped by the temperature but also by the OLR. However, the OLR effect was more pronounced for the mesophilic systems as compared to the thermophilic ones. In contrast, metatranscriptome mapping to MAGs subsequently normalized to MAG abundances showed that under thermophilic conditions, MAGs respond to increased OLRs by shifting their transcriptional activities mainly without adjusting their proliferation rates. This is a clear difference compared to the behavior of the microbiome under mesophilic conditions. Here, the response to increased OLRs involved adjusting of proliferation rates and corresponding transcriptional activities. The analysis led to the identification of MAGs positively responding to increased OLRs. The most outstanding MAGs in this regard, obviously well adapted to higher OLRs and/or associated conditions, were assigned to the order Clostridiales (Acetivibrio sp.) for the mesophilic biofilm and the orders Bacteroidales (Prevotella sp. and an unknown species), Lachnospirales (Herbinix sp. and Kineothrix sp.) and Clostridiales (Clostridium sp.) for the thermophilic biofilm. Genome-based metabolic reconstruction and transcriptional profiling revealed that positively responding MAGs mainly are involved in hydrolysis of grass silage, acidogenesis and/or acetogenesis. Conclusions: An integrated-omics approach enabled the identification of new AD biofilm keystone species featuring outstanding performance under stress conditions such as increased OLRs. Genome-based knowledge on the metabolic potential and transcriptional activity of responsive microbiome members will contribute to the development of improved microbiological AD management strategies for biomethanation of renewable biomass. © 2020 The Author(s)

Repositorium für Naturwissenschaften und Technik (TIB Hannover)

Efficient and accurate P-value computation for Position Weight Matrices

Author: A Liefooghe
C Pizzi
E Wingender
G Bejerano
GE Crooks
GZ Hertz
H Huang
Hélène Touzet
J Zhang
Jean-Stéphane Varré
JM Claverie
K Malde
M Beckstette
M Garey
R Staden
S Mount
S Rahmann
TD Wu
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background Position Weight Matrices (PWMs) are probabilistic representations of signals in sequences. They are widely used to model approximate patterns in DNA or in protein sequences. The usage of PWMs needs as a prerequisite to knowing the statistical significance of a word according to its score. This is done by defining the P-value of a score, which is the probability that the background model can achieve a score larger than or equal to the observed value. This gives rise to the following problem: Given a P-value, find the corresponding score threshold. Existing methods rely on dynamic programming or probability generating functions. For many examples of PWMs, they fail to give accurate results in a reasonable amount of time. Results The contribution of this paper is two fold. First, we study the theoretical complexity of the problem, and we prove that it is NP-hard. Then, we describe a novel algorithm that solves the P-value problem efficiently. The main idea is to use a series of discretized score distributions that improves the final result step by step until some convergence criterion is met. Moreover, the algorithm is capable of calculating the exact P-value without any error, even for matrices with non-integer coefficient values. The same approach is also used to devise an accurate algorithm for the reverse problem: finding the P-value for a given score. Both methods are implemented in a software called TFM-PVALUE, that is freely available. Conclusion We have tested TFM-PVALUE on a large set of PWMs representing transcription factor binding sites. Experimental results show that it achieves better performance in terms of computational time and precision than existing tools.</p

HAL - Lille 3

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

INRIA a CCSD electronic archive server

PubMed Central

HAL: Hyper Article en Ligne

Lightweight comparison of RNAs based on exact sequence–structure matches

Author: Allali
Altschul
Backofen
Bafna
Bahr
Bauer
Blin
Cannone
Evans
Gardner
Griffiths-Jones
Havgaard
Hentze
Hofacker
Hofacker
Huttenhofer
Höchsmann
Jiang
Jiang
Lin
Martineau
Mathews
Mathews
Michael Beckstette
Otto
Rolf Backofen
Sankoff
Sebastian Will
Serganov
Steffen Heyne
Torarinsson
Will
Wilm
Wilting
Zhang
Publication venue: Oxford University Press
Publication date: 01/01/2009
Field of study

Motivation: Specific functions of ribonucleic acid (RNA) molecules are often associated with different motifs in the RNA structure. The key feature that forms such an RNA motif is the combination of sequence and structure properties. In this article, we introduce a new RNA sequence–structure comparison method which maintains exact matching substructures. Existing common substructures are treated as whole unit while variability is allowed between such structural motifs

Publications at Bielefeld University