34 research outputs found

    The SIB Swiss Institute of Bioinformatics' resources: focus on curated databases

    Get PDF
    The SIB Swiss Institute of Bioinformatics (www.isb-sib.ch) provides world-class bioinformatics databases, software tools, services and training to the international life science community in academia and industry. These solutions allow life scientists to turn the exponentially growing amount of data into knowledge. Here, we provide an overview of SIB's resources and competence areas, with a strong focus on curated databases and SIB's most popular and widely used resources. In particular, SIB's Bioinformatics resource portal ExPASy features over 150 resources, including UniProtKB/Swiss-Prot, ENZYME, PROSITE, neXtProt, STRING, UniCarbKB, SugarBindDB, SwissRegulon, EPD, arrayMap, Bgee, SWISS-MODEL Repository, OMA, OrthoDB and other databases, which are briefly described in this article

    State aggregation for fast likelihood computations in molecular evolution

    Full text link
    AbstractMotivationCodon models are widely used to identify the signature of selection at the molecular level and to test for changes in selective pressure during the evolution of genes encoding proteins. The large size of the state space of the Markov processes used to model codon evolution makes it difficult to use these models with large biological datasets. We propose here to use state aggregation to reduce the state space of codon models and, thus, improve the computational performance of likelihood estimation on these models.ResultsWe show that this heuristic speeds up the computations of the M0 and branch-site models up to 6.8 times. We also show through simulations that state aggregation does not introduce a detectable bias. We analysed a real dataset and show that aggregation provides highly correlated predictions compared to the full likelihood computations. Finally, state aggregation is a very general approach and can be applied to any continuous-time Markov process-based model with large state space, such as amino acid and coevolution models. We therefore discuss different ways to apply state aggregation to Markov models used in phylogenetics.AvailabilityThe heuristic is implemented in the godon package (https://bitbucket.org/Davydov/godon) and in a version of FastCodeML (https://gitlab.isb-sib.ch/phylo/fastcodeml).</jats:sec

    Large-Scale Comparative Analysis of Codon Models Accounting for Protein and Nucleotide Selection

    Full text link
    AbstractThere are numerous sources of variation in the rate of synonymous substitutions inside genes, such as direct selection on the nucleotide sequence, or mutation rate variation. Yet scans for positive selection rely on codon models which incorporate an assumption of effectively neutral synonymous substitution rate, constant between sites of each gene. Here we perform a large-scale comparison of approaches which incorporate codon substitution rate variation and propose our own simple yet effective modification of existing models. We find strong effects of substitution rate variation on positive selection inference. More than 70% of the genes detected by the classical branch-site model are presumably false positives caused by the incorrect assumption of uniform synonymous substitution rate. We propose a new model which is strongly favored by the data while remaining computationally tractable. With the new model we can capture signatures of nucleotide level selection acting on translation initiation and on splicing sites within the coding region. Finally, we show that rate variation is highest in the highly recombining regions, and we propose that recombination and mutation rate variation, such as high CpG mutation rate, are the two main sources of nucleotide rate variation. While we detect fewer genes under positive selection in Drosophila than without rate variation, the genes which we detect contain a stronger signal of adaptation of dynein, which could be associated withWolbachiainfection. We provide software to perform positive selection analysis using the new model.</jats:p

    Duplication history and molecular evolution of the rbcS multigene family in angiosperms

    No full text
    The rbcS multigene family evolved through complex duplication events leading to species-specific gene copies. Selection and coevolution with rbcL constrained rbcS evolution thereby limiting the divergence of each gene copy.</jats:p

    Gene-set Enrichment with Regularized Regression

    Full text link
    AbstractMotivationCanonical methods for gene-set enrichment analysis assume independence between gene-sets. In practice, heterogeneous gene-sets from diverse sources are frequently combined and used, resulting in gene-sets with overlapping genes. They compromise statistical modelling and complicate interpretation of results.ResultsWe rephrase gene-set enrichment as a regression problem. Given some genes of interest (e.g.a list of hits from an experiment) and gene-sets (e.g.functional annotations or pathways), we aim to identify a sparse list of gene-sets for the genes of interest. In a regression framework, this amounts to identifying a minimum set of gene-sets that optimally predicts whether any gene belongs to the given genes of interest. To accommodate redundancy between gene-sets, we propose regularized regression techniques such as theelastic net.We report that regression-based results are consistent with established gene-set enrichment methods but more parsimonious and interpretable.AvailabilityWe implement the model ingerr(gene-set enrichment with regularized regression), an R package freely available athttps://github.com/TaoDFang/gerrand submitted toBioconductor.Code and data required to reproduce the results of this study are available athttps://github.com/TaoDFang/GeneModuleAnnotationPaper.ContactJitao David Zhang ([email protected]), Roche Pharma Research and Early Development, Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd. Grenzacherstrasse 124, 4070 Basel, Switzerland.</jats:sec

    Additional file 8 of Genome rearrangements and selection in multi-chromosome bacteria Burkholderia spp.

    No full text
    Figure S8. Phylogenetic tree constructed based on the concatenation of alignments of genes forming the genomic island. (PDF 6 kb

    Additional file 1 of Genome rearrangements and selection in multi-chromosome bacteria Burkholderia spp.

    No full text
    Figure S1. The number of new genes added to the pangenome upon addition of new strains. (a) Burkholderia spp., (b) B. pseudomallei, and (c) B. mallei. The number of new genes is plotted as a function of the number (n) of strains sequentially added (see the model in [81]). For each n, points are the values obtained for different strain combinations; red symbols are the averages of these values. The superimposed line is a fit with a decaying power law y y=A∗nB. (PDF 1224 kb

    Additional file 6 of Genome rearrangements and selection in multi-chromosome bacteria Burkholderia spp.

    No full text
    Figure S6. Whole-genome alignments of cepacia strains that were not included in the rearrangement analysis due to likely artifacts of the genome assembly. (a) Burkholderia sp. 383 and B. cepacia strain LO6 (b) Burkholderia sp. 383 and B. contaminans strain MS14, (c) Burkholderia sp. 383 and B. cenocepacia strain 895, (d) B. cepacia strain LO6 and B. cenocepacia strain 895. (PDF 25,604 kb
    corecore