341 research outputs found
Detecting patterns of species diversification in the presence of both rate shifts and mass extinctions
Recent methodological advances are enabling better examination of speciation
and extinction processes and patterns. A major open question is the origin of
large discrepancies in species number between groups of the same age. Existing
frameworks to model this diversity either focus on changes between lineages,
neglecting global effects such as mass extinctions, or focus on changes over
time which would affect all lineages. Yet it seems probable that both lineages
differences and mass extinctions affect the same groups. Here we used
simulations to test the performance of two widely used methods, under complex
scenarios. We report good performances, although with a tendency to
over-predict events when increasing the complexity of the scenario. Overall, we
find that lineage shifts are better detected than mass extinctions. This work
has significance for assessing the methods currently used for estimating
changes in diversification using phylogenies and developing new tests.Comment: 34 pages, 11 figure
IQRray, a new method for Affymetrix microarray quality control, and the homologous organ conservation score, a new benchmark method for quality control metrics
Motivation: Microarray results accumulated in public repositories are widely reused in meta-analytical studies and secondary databases. The quality of the data obtained with this technology varies from experiment to experiment, and an efficient method for quality assessment is necessary to ensure their reliability. Results: The lack of a good benchmark has hampered evaluation of existing methods for quality control. In this study, we propose a new independent quality metric that is based on evolutionary conservation of expression profiles. We show, using 11 large organ-specific datasets, that IQRray, a new quality metrics developed by us, exhibits the highest correlation with this reference metric, among 14 metrics tested. IQRray outperforms other methods in identification of poor quality arrays in datasets composed of arrays from many independent experiments. In contrast, the performance of methods designed for detecting outliers in a single experiment like Normalized Unscaled Standard Error and Relative Log Expression was low because of the inability of these methods to detect datasets containing only low-quality arrays and because the scores cannot be directly compared between experiments. Availability and implementation: The R implementation of IQRray is available at: ftp://lausanne.isb-sib.ch/pub/databases/Bgee/general/IQRray.R. Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics onlin
Generating Homology Relationships by Alignment of Anatomical Ontologies
The anatomy of model species is described in ontologies, which are used to standardize the annotations of experimental data, such as gene expression patterns. To compare such data between species, we aim to establish homology relations between ontologies describing different species. We present a new algorithm, and its implementation in the software Homolonto, to create new relationships between anatomical ontologies, based on the homology concept. These relationships and the Homolonto software are available at "http://bgee.unil.ch/.":http://bgee.unil.ch
Correcting for the bias due to expression specificity improves the estimation of constrained evolution of expression between mouse and human
Motivation: Comparative analyses of gene expression data from different species have become an important component of the study of molecular evolution. Thus methods are needed to estimate evolutionary distances between expression profiles, as well as a neutral reference to estimate selective pressure. Divergence between expression profiles of homologous genes is often calculated with Pearson's or Euclidean distance. Neutral divergence is usually inferred from randomized data. Despite being widely used, neither of these two steps has been well studied. Here, we analyze these methods formally and on real data, highlight their limitations and propose improvements. Results: It has been demonstrated that Pearson's distance, in contrast to Euclidean distance, leads to underestimation of the expression similarity between homologous genes with a conserved uniform pattern of expression. Here, we first extend this study to genes with conserved, but specific pattern of expression. Surprisingly, we find that both Pearson's and Euclidean distances used as a measure of expression similarity between genes depend on the expression specificity of those genes. We also show that the Euclidean distance depends strongly on data normalization. Next, we show that the randomization procedure that is widely used to estimate the rate of neutral evolution is biased when broadly expressed genes are abundant in the data. To overcome this problem, we propose a novel randomization procedure that is unbiased with respect to expression profiles present in the datasets. Applying our method to the mouse and human gene expression data suggests significant gene expression conservation between these species. Contact: [email protected]; [email protected] Supplementary information: Supplementary data are available at Bioinformatics onlin
Comparative analysis of human and mouse expression data illuminates tissue-specific evolutionary patterns of miRNAs
MicroRNAs (miRNAs) constitute an important class of gene regulators. While models have been proposed to explain their appearance and expansion, the validation of these models has been difficult due to the lack of comparative studies. Here, we analyze miRNA evolutionary patterns in two mammals, human and mouse, in relation to the age of miRNA families. In this comparative framework, we confirm some predictions of previously advanced models of miRNA evolution, e.g. that miRNAs arise more frequently de novo than by duplication, or that the number of protein-coding gene targeted by miRNAs decreases with evolutionary time. We also corroborate that miRNAs display an increase in expression level with evolutionary time, however we show that this relation is largely tissue-dependent, and especially low in embryonic or nervous tissues. We identify a bias of tag-sequencing techniques regarding the assessment of breadth of expression, leading us, contrary to predictions, to find more tissue-specific expression of older miRNAs. Together, our results refine the models used so far to depict the evolution of miRNA genes. They underline the role of tissue-specific selective forces on the evolution of miRNAs, as well as the potential co-evolution patterns between miRNAs and the protein-coding genes they targe
Selectome: a database of positive selection
Genome wide scans have shown that positive selection is relatively frequent at the molecular level. It is of special interest to identify which protein sites and which phylogenetic branches are affected. We present Selectome, a database which provides the results of a rigorous branch-site specific likelihood test for positive selection. The Web interface presents test results mapped both onto phylogenetic trees and onto protein alignments. It allows rapid access to results by keyword, gene name, or taxonomy based queries. Selectome is freely available at http://bioinfo.unil.ch/selectom
Homolonto: generating homology relationships by pairwise alignment of ontologies and application to vertebrate anatomy
Motivation: The anatomy of model species is described in ontologies, which are used to standardize the annotations of experimental data, such as gene expression patterns. To compare such data between species, we need to establish relations between ontologies describing different species. Results: We present a new algorithm, and its implementation in the software Homolonto, to create new relationships between anatomical ontologies, based on the homology concept. Homolonto uses a supervised ontology alignment approach. Several alignments can be merged, forming homology groups. We also present an algorithm to generate relationships between these homology groups. This has been used to build a multi-species ontology, for the database of gene expression evolution Bgee. Availability: download section of the Bgee website http://bgee.unil.ch/ Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics onlin
Molecular Evolution and Gene Function
One of the basic questions of phylogenomics is how gene function evolves, whether among species or inside gene families. In this chapter, we provide a brief overview of the problems associated with defining gene function in a manner which allows comparisons which are both large scale and evolutionarily relevant. The main source of functional data, despite its limitations, is transcrip-tomics. Functional data provides information on evolutionary mechanisms primarily by showing which functional classes of genes evolve under stronger or weaker purifying or adaptive selection, and on which classes of mutations (e.g., substitutions or duplications). However, the example of the "ortholog conjecture" shows that we are still not at a point where we can confidently study phylogenomically the evolution of gene function at a precise scale
gcodeml: A Grid-enabled Tool for Detecting Positive Selection in Biological Evolution
One of the important questions in biological evolution is to know if certain
changes along protein coding genes have contributed to the adaptation of
species. This problem is known to be biologically complex and computationally
very expensive. It, therefore, requires efficient Grid or cluster solutions to
overcome the computational challenge. We have developed a Grid-enabled tool
(gcodeml) that relies on the PAML (codeml) package to help analyse large
phylogenetic datasets on both Grids and computational clusters. Although we
report on results for gcodeml, our approach is applicable and customisable to
related problems in biology or other scientific domains.Comment: 10 pages, 4 figures. To appear in the HealthGrid 2012 con
- …
