Search CORE

2,260 research outputs found

Detecting and comparing non-coding RNAs in the high-throughput era.

Author: Bussotti Giovanni
Enright Anton J
Notredame Cedric
Publication venue: Int J Mol Sci
Publication date: 01/01/2013
Field of study

In recent years there has been a growing interest in the field of non-coding RNA. This surge is a direct consequence of the discovery of a huge number of new non-coding genes and of the finding that many of these transcripts are involved in key cellular functions. In this context, accurately detecting and comparing RNA sequences has become important. Aligning nucleotide sequences is a key requisite when searching for homologous genes. Accurate alignments reveal evolutionary relationships, conserved regions and more generally any biologically relevant pattern. Comparing RNA molecules is, however, a challenging task. The nucleotide alphabet is simpler and therefore less informative than that of amino-acids. Moreover for many non-coding RNAs, evolution is likely to be mostly constrained at the structural level and not at the sequence level. This results in very poor sequence conservation impeding comparison of these molecules. These difficulties define a context where new methods are urgently needed in order to exploit experimental results to their full potential. This review focuses on the comparative genomics of non-coding RNAs in the context of new sequencing technologies and especially dealing with two extremely important and timely research aspects: the development of new methods to align RNAs and the analysis of high-throughput data

Multidisciplinary Digital Publishing Institute

CiteSeerX

Crossref

Directory of Open Access Journals

PubMed Central

Apollo (Cambridge)

R-Coffee: a web server for accurately aligning noncoding RNA sequences

Author: Higgins Desmond G.
Moretti Sébastien
Notredame Cédric
Wilm Andreas
Xenarios Ioannis
Publication venue
Publication date: 02/08/2017
Field of study

The R-Coffee web server produces highly accurate multiple alignments of noncoding RNA (ncRNA) sequences, taking into account predicted secondary structures. R-Coffee uses a novel algorithm recently incorporated in the T-Coffee package. R-Coffee works along the same lines as T-Coffee: it uses pairwise or multiple sequence alignment (MSA) methods to compute a primary library of input alignments. The program then computes an MSA highly consistent with both the alignments contained in the library and the secondary structures associated with the sequences. The secondary structures are predicted using RNAplfold. The server provides two modes. The slow/accurate mode is restricted to small datasets (less than 5 sequences less than 150 nucleotides) and combines R-Coffee with Consan, a very accurate pairwise RNA alignment method. For larger datasets a fast method can be used (RM-Coffee mode), that uses R-Coffee to combine the output of the three packages which combines the outputs from programs found to perform best on RNA (MUSCLE, MAFFT and ProbConsRNA). Our BRAliBase benchmarks indicate that the R-Coffee/Consan combination is one of the best ncRNA alignment methods for short sequences, while the RM-Coffee gives comparable results on longer sequences. The R-Coffee web server is available at http://www.tcoffee.or

RERO DOC Digital Library

Expresso: automatic incorporation of structural information in multiple sequence alignments using 3D-Coffee

Author: Armougom Fabrice
Audic Stéphane
Dumas Pierre
Keduas Vladimir
Moretti Sébastien
Notredame Cedric
Poirot Olivier
Schaeli Basile
Publication venue
Publication date: 02/08/2017
Field of study

Expresso is a multiple sequence alignment server that aligns sequences using structural information. The user only needs to provide sequences. The server runs BLAST to identify close homologues of the sequences within the PDB database. These PDB structures are used as templates to guide the alignment of the original sequences using structure-based sequence alignment methods like SAP or Fugue. The final result is a multiple sequence alignment of the original sequences based on the structural information of the templates. An advanced mode makes it possible to either upload private structures or specify which PDB templates should be used to model each sequence. Providing the suitable structural information is available, Expresso delivers sequence alignments with accuracy comparable with structure-based alignments. The server is available on http://www.tcoffee.or

RERO DOC Digital Library

BlastR—fast and accurate database searches for non-coding RNAs

Author: Beaudoing Emmanuel
Bucher Philipp
Bussotti Giovanni
Erb Ionas
Notredame Cedric
Raineri Emanuele
Wilm Andreas
Zytnicki Matthias
Publication venue
Publication date: 02/08/2017
Field of study

We present and validate BlastR, a method for efficiently and accurately searching non-coding RNAs. Our approach relies on the comparison of di-nucleotides using BlosumR, a new log-odd substitution matrix. In order to use BlosumR for comparison, we recoded RNA sequences into protein-like sequences. We then showed that BlosumR can be used along with the BlastP algorithm in order to search non-coding RNA sequences. Using Rfam as a gold standard, we benchmarked this approach and show BlastR to be more sensitive than BlastN. We also show that BlastR is both faster and more sensitive than BlastP used with a single nucleotide log-odd substitution matrix. BlastR, when used in combination with WU-BlastP, is about 5% more accurate than WU-BlastN and about 50 times slower. The approach shown here is equally effective when combined with the NCBI-Blast package. The software is an open source freeware available from www.tcoffee.org/blastr.htm

RERO DOC Digital Library

Aubergene - a sensitive genome alignment tool.

Author: Arslan
Heger
J. Heringa
Karolchik
Kellis
Miller
Morgenstern
Murphy
Notredame
Park
R. Szklarczyk
Thomas
Vingron
Waterston
Ye
Zhang
Publication venue
Publication date: 01/01/2006
Field of study

Motivation: The accumulation of genome sequences will only accelerate in the coming years. We aim to use this abundance of data to improve the quality of genomic alignments and devise a method which is capable of detecting regions evolving under weak or no evolutionary constraints. Results: We describe a genome alignment program AuberGene, which explores the idea of transitivity of local alignments. Assessment of the program was done based on a 2 Mbp genomic region containing the CFTR gene of 13 species. In this region, we can identify 53% of human sequence sharing common ancestry with mouse, as compared with 44% found using the usual pairwise alignment. Between human and tetraodon 93 orthologous exons are found, as compared with 77 detected by the pairwise human-tetraodon comparison. AuberGene allows the user to (1) identify distant, previously undetected, conserved orthogonal regions such as ORFs or regulatory regions; (2) identify neutrally evolving regions in related species which are often overlooked by other alignment programs; (3) recognize false orthologous genomic regions. The increased sensitivity of the method is not obtained at the cost of reduced specificity. Our results suggest that, over the CFTR region, human shares 10% more sequence with mouse than previously thought (∼50%, instead of 40% found with the pairwise alignment). © 2006 Oxford University Press

Crossref

VU Research Portal

Multiple sequence alignment based on set covers

Author: A. Bahr
B. Manthey
B. Morgenstern
B. Morgenstern
C. Notredame
D. Gusfield
G. Vogt
J.D. Thompson
K. Katoh
O. Gotoh
P. Zhao
R.E. Green
R.F. Smith
S. Henikoff
T. Müller
T.P. Li
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2004
Field of study

We introduce a new heuristic for the multiple alignment of a set of sequences. The heuristic is based on a set cover of the residue alphabet of the sequences, and also on the determination of a significant set of blocks comprising subsequences of the sequences to be aligned. These blocks are obtained with the aid of a new data structure, called a suffix-set tree, which is constructed from the input sequences with the guidance of the residue-alphabet set cover and generalizes the well-known suffix tree of the sequence set. We provide performance results on selected BAliBASE amino-acid sequences and compare them with those yielded by some prominent approaches

arXiv.org e-Print Archive

CiteSeerX

Crossref

T-Coffee: a web server for the multiple sequence alignment of protein and RNA sequences using structural information and homology extension

Author: Chang Jia-Ming
Di Tommaso Paolo
Montanyola Alberto
Moretti Sebastien
Notredame Cedric
Orobitg Miquel
Taly Jean-François
Xenarios Ioannis
Publication venue
Publication date: 02/08/2017
Field of study

This article introduces a new interface for T-Coffee, a consistency-based multiple sequence alignment program. This interface provides an easy and intuitive access to the most popular functionality of the package. These include the default T-Coffee mode for protein and nucleic acid sequences, the M-Coffee mode that allows combining the output of any other aligners, and template-based modes of T-Coffee that deliver high accuracy alignments while using structural or homology derived templates. These three available template modes are Expresso for the alignment of protein with a known 3D-Structure, R-Coffee to align RNA sequences with conserved secondary structures and PSI-Coffee to accurately align distantly related sequences using homology extension. The new server benefits from recent improvements of the T-Coffee algorithm and can align up to 150 sequences as long as 10 000 residues and is available from both http://www.tcoffee.org and its main mirror http://tcoffee.crg.ca

RERO DOC Digital Library

Evolution of genes and repeats in the Nimrod superfamily

Author: Andrade
B. Sipos
Bork
Bork
Callebaut
Chen
D. Hultmark
Do
Doliana
E. Kurucz
Edgar
Evans
Finn
Guindon
Holt
Huelsenbeck
Hughes
I. Ando
J. Zsamboki
Ju
K. Somogyi
Kumar
Kumar
Kurucz
Liao
Mangahas
McAllister
Morgenstern
Nei
Nei
Nei
Nei
Nishikawa
Notredame
Ota
Parmley
Posada
Quesada
Redelings
Russo
Schuster-B ckler
Simmons
Stajich
Strimmer
Swanson
Swidan
Thompson
Xia
Z. Penzes
Zdobnov
Zhang
Zou
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2008
Field of study

The recently identified Nimrod superfamily is characterized by the presence of a special type of EGF repeat, the NIM repeat, located right after a typical CCXGY/W amino acid motif. On the basis of structural features, nimrod genes can be divided into three types. The proteins encoded by Draper-type genes have an EMI domain at the N-terminal part and only one copy of the NIM motif, followed by a variable number of EGF-like repeats. The products of Nimrod B-type and Nimrod C-type genes (including the eater gene) have different kinds of N-terminal domains, and lack EGF-like repeats but contain a variable number of NIM repeats. Draper and Nimrod C-type (but not Nimrod B-type) proteins carry a transmembrane domain. Several members of the superfamily were claimed to function as receptors in phagocytosis and/or binding of bacteria, which indicates an important role in the cellular immunity and the elimination of apoptotic cells. In this paper, the evolution of the Nimrod superfamily is studied with various methods on the level of genes and repeats. A hypothesis is presented in which the NIM repeat, along with the EMI domain, emerged by structural reorganizations at the end of an EGF-like repeat chain, suggesting a mechanism for the formation of novel types of repeats. The analyses revealed diverse evolutionary patterns in the sequences containing multiple NIM repeats. Although in the Nimrod B and Nimrod C proteins show characteristics of independent evolution, many internal NIM repeats in Eater sequences seem to have undergone concerted evolution. An analysis of the nimrod genes has been performed using phylogenetic and other methods and an evolutionary scenario of the origin and diversification of the Nimrod superfamily is proposed. Our study presents an intriguing example how the evolution of multigene families may contribute to the complexity of the innate immune response

Crossref

Repository of the Academy's Library

Identification of an osteocalcin isoform in fish with a large acidic prodomain

Author: Amores
Boskey
Chomczynski
Christensen
Crooks
Ducy
Engelke
Frazão
Gorski
Gundberg
Hauschka
Hauschka
He
Hosoda
Hunter
Huq
Jaillon
Kumar
Laizé
Nakayama
Nielsen
Nishimoto
Notredame
Patthy
Poser
Price
Sommer
Stothard
Taylor
Thompson
Woods
Publication venue: 'American Society for Biochemistry & Molecular Biology (ASBMB)'
Publication date: 03/06/2014
Field of study

Osteocalcin is a small, secreted bone protein whose gene consists of four exons. In the course of analyzing the structure of fish osteocalcin genes, we recently found that the spotted green pufferfish has two possible exon 2 structures, one of 15 bp and the other of 324 bp. Subsequent analysis of the pufferfish cDNA showed that only the transcript with a large exon 2 exists. Exon 2 codes for the osteocalcin propeptide, and exon 2 of pufferfish osteocalcin is ∼3.4-fold larger than exon 2 previously found in other vertebrate species. We have termed this new pufferfish osteocalcin isoform OC2. Additional studies showed that the OC2 isoform is restricted to a unique fish taxonomic group, the Osteichthyes; OC2 is the only osteocalcin isoform found so far in six Osteichthyes species, whereas both OC1 and OC2 isoforms coexist in zebrafish and rainbow trout. The larger size of the OC2 propeptide is due to an acidic region that is likely to be highly phosphorylated and has no counterpart in the OC1 propeptide. We propose 1) that OC1 and OC2 are encoded by distinct genes that originated from a duplication event that probably occurred in the teleost fish lineage soon after divergence from tetrapods and 2) that the novel OC2 propeptide could be, if secreted, a phosphoprotein that participates in the regulation of biomineralization through its large acidic and phosphorylated propeptide

Crossref

Sapientia (Univ. do Algarve)

Lessons Learned: Recommendations for Establishing Critical Periodic Scientific Benchmarking

Author: Capella-Gutierrez Salvador
de la Iglesia Diana
Dessimoz Christophe
Fernandez José M.
Gelpí Josep Lluís
Haas Juergen
Lourenco Analia
Notredame Cedric
Repchevsky Dmitry
Schwede Torsten
Valencia Alfonso
Publication venue
Publication date: 01/01/2017
Field of study

The dependence of life scientists on software has steadily grown in recent years. For many tasks, researchers have to decide which of the available bioinformatics software are more suitable for their specific needs. Additionally researchers should be able to objectively select the software that provides the highest accuracy, the best efficiency and the highest level of reproducibility when integrated in their research projects. Critical benchmarking of bioinformatics methods, tools and web services is therefore an essential community service, as well as a critical component of reproducibility efforts. Unbiased and objective evaluations are challenging to set up and can only be effective when built and implemented around community driven efforts, as demonstrated by the many ongoing community challenges in bioinformatics that followed the success of CASP. Community challenges bring the combined benefits of intense collaboration, transparency and standard harmonization. Only open systems for the continuous evaluation of methods offer a perfect complement to community challenges, offering to larger communities of users that could extend far beyond the community of developers, a window to the developments status that they can use for their specific projects. We understand by continuous evaluation systems as those services which are always available and periodically update their data and/or metrics according to a predefined schedule keeping in mind that the performance has to be always seen in terms of each research domain. We argue here that technology is now mature to bring community driven benchmarking efforts to a higher level that should allow effective interoperability of benchmarks across related methods. New technological developments allow overcoming the limitations of the first experiences on online benchmarking e.g. EVA. We therefore describe OpenEBench, a novel infra-structure designed to establish a continuous automated benchmarking system for bioinformatics methods, tools and web services. OpenEBench is being developed so as to cater for the needs of the bioinformatics community, especially software developers who need an objective and quantitative way to inform their decisions as well as the larger community of end-users, in their search for unbiased and up-to-date evaluation of bioinformatics methods. As such OpenEBench should soon become a central place for bioinformatics software developers, community-driven benchmarking initiatives, researchers using bioinformatics methods, and funders interested in the result of methods evaluation.Preprin

UPCommons. Portal del coneixement obert de la UPC