14 research outputs found

    State aggregation for fast likelihood computations in molecular evolution

    Full text link
    AbstractMotivationCodon models are widely used to identify the signature of selection at the molecular level and to test for changes in selective pressure during the evolution of genes encoding proteins. The large size of the state space of the Markov processes used to model codon evolution makes it difficult to use these models with large biological datasets. We propose here to use state aggregation to reduce the state space of codon models and, thus, improve the computational performance of likelihood estimation on these models.ResultsWe show that this heuristic speeds up the computations of the M0 and branch-site models up to 6.8 times. We also show through simulations that state aggregation does not introduce a detectable bias. We analysed a real dataset and show that aggregation provides highly correlated predictions compared to the full likelihood computations. Finally, state aggregation is a very general approach and can be applied to any continuous-time Markov process-based model with large state space, such as amino acid and coevolution models. We therefore discuss different ways to apply state aggregation to Markov models used in phylogenetics.AvailabilityThe heuristic is implemented in the godon package (https://bitbucket.org/Davydov/godon) and in a version of FastCodeML (https://gitlab.isb-sib.ch/phylo/fastcodeml).</jats:sec

    Large-Scale Comparative Analysis of Codon Models Accounting for Protein and Nucleotide Selection

    Full text link
    AbstractThere are numerous sources of variation in the rate of synonymous substitutions inside genes, such as direct selection on the nucleotide sequence, or mutation rate variation. Yet scans for positive selection rely on codon models which incorporate an assumption of effectively neutral synonymous substitution rate, constant between sites of each gene. Here we perform a large-scale comparison of approaches which incorporate codon substitution rate variation and propose our own simple yet effective modification of existing models. We find strong effects of substitution rate variation on positive selection inference. More than 70% of the genes detected by the classical branch-site model are presumably false positives caused by the incorrect assumption of uniform synonymous substitution rate. We propose a new model which is strongly favored by the data while remaining computationally tractable. With the new model we can capture signatures of nucleotide level selection acting on translation initiation and on splicing sites within the coding region. Finally, we show that rate variation is highest in the highly recombining regions, and we propose that recombination and mutation rate variation, such as high CpG mutation rate, are the two main sources of nucleotide rate variation. While we detect fewer genes under positive selection in Drosophila than without rate variation, the genes which we detect contain a stronger signal of adaptation of dynein, which could be associated withWolbachiainfection. We provide software to perform positive selection analysis using the new model.</jats:p

    Duplication history and molecular evolution of the rbcS multigene family in angiosperms

    No full text
    The rbcS multigene family evolved through complex duplication events leading to species-specific gene copies. Selection and coevolution with rbcL constrained rbcS evolution thereby limiting the divergence of each gene copy.</jats:p

    State aggregation for fast likelihood computations in molecular evolution

    No full text
    Abstract Motivation Codon models are widely used to identify the signature of selection at the molecular level and to test for changes in selective pressure during the evolution of genes encoding proteins. The large size of the state space of the Markov processes used to model codon evolution makes it difficult to use these models with large biological datasets. We propose here to use state aggregation to reduce the state space of codon models and, thus, improve the computational performance of likelihood estimation on these models. Results We show that this heuristic speeds up the computations of the M0 and branch-site models up to 6.8 times. We also show through simulations that state aggregation does not introduce a detectable bias. We analyzed a real dataset and show that aggregation provides highly correlated predictions compared to the full likelihood computations. Finally, state aggregation is a very general approach and can be applied to any continuous-time Markov process-based model with large state space, such as amino acid and coevolution models. We therefore discuss different ways to apply state aggregation to Markov models used in phylogenetics. Availability and Implementation The heuristic is implemented in the godon package (https://bitbucket.org/Davydov/godon) and in a version of FastCodeML (https://gitlab.isb-sib.ch/phylo/fastcodeml). Supplementary information Supplementary data are available at Bioinformatics online

    Genome rearrangements and selection in multi-chromosome bacteria <i>Burkholderia spp</i>

    Full text link
    AbstractBackgroundThe genus Burkholderia consists of species that occupy remarkably diverse ecological niches. Its best known members are important pathogens, B. mallei and B. pseudomallei, which cause glanders and melioidosis, respectively. Burkholderia genomes are unusual due to their multichromosomal organization.ResultsWe performed integrated genomic analysis of 127 Burkholderia strains. The pan-genome is open with the saturation to be reached between 86,000 and 88,000 genes. The reconstructed rearrangements indicate a strong avoidance of intra-replichore inversions that is likely caused by selection against the transfer of large groups of genes between the leading and the lagging strands. Translocated genes also tend to retain their position in the leading or the lagging strand, and this selection is stronger for large syntenies. Integrated reconstruction of chromosome rearrangements in the context of strains phylogeny reveals parallel rearrangements that may indicate inversion-based phase variation and integration of new genomic islands. In particular, we detected parallel inversions in the second chromosomes of B. pseudomallei with breakpoints formed by genes encoding membrane components of multidrug resistance complex, that may be linked to a phase variation mechanism. Two genomic islands, spreading horizontally between chromosomes, were detected in the B. cepacia group.ConclusionsThis study demonstrates the power of integrated analysis of pan-genomes, chromosome rearrangements, and selection regimes. Non-random inversion patterns indicate selective pressure, inversions are particularly frequent in a recent pathogen B. mallei, and, together with periods of positive selection at other branches, may indicate adaptation to new niches. One such adaptation could be a possible phase variation mechanism in B. pseudomallei.</jats:sec

    Evolution of the protein stoichiometry in the L12 stalk of bacterial and organellar ribosomes

    Full text link
    The emergence of ribosomes and translation factors is central for understanding the origin of life. Recruitment of translation factors to bacterial ribosomes is mediated by the L12 stalk composed of protein L10 and several copies of protein L12, the only multi-copy protein of the ribosome. Here we predict stoichiometries of L12 stalk for >1,200 bacteria, mitochondria and chloroplasts by a computational analysis, and validate the predictions by quantitative mass spectrometry. The majority of bacteria have L12 stalks allowing for binding of four or six copies of L12, largely independent of the taxonomic group or living conditions of the bacteria, whereas some cyanobacteria have eight copies. Mitochondrial and chloroplast ribosomes can accommodate six copies of L12. The last universal common ancestor probably had six molecules of L12 molecules bound to L10. Changes of the stalk composition provide a unique possibility to trace the evolution of protein components of the ribosome

    Tumor-agnostic transcriptome-based classifier identifies spatial infiltration patterns of CD8+T cells in the tumor microenvironment and predicts clinical outcome in early-phase and late-phase clinical trials

    No full text
    Background The immune status of a patient’s tumor microenvironment (TME) may guide therapeutic interventions with cancer immunotherapy and help identify potential resistance mechanisms. Currently, patients’ immune status is mostly classified based on CD8+tumor-infiltrating lymphocytes. An unmet need exists for comparable and reliable precision immunophenotyping tools that would facilitate clinical treatment-relevant decision-making and the understanding of how to overcome resistance mechanisms.Methods We systematically analyzed the CD8 immunophenotype of 2023 patients from 14 phase I–III clinical trials using immunohistochemistry (IHC) and additionally profiled gene expression by RNA-sequencing (RNA-seq). CD8 immunophenotypes were classified by pathologists into CD8-desert, CD8-excluded or CD8-inflamed tumors using CD8 IHC staining in epithelial and stromal areas of the tumor. Using regularized logistic regression, we developed an RNA-seq-based classifier as a surrogate to the IHC-based spatial classification of CD8+tumor-infiltrating lymphocytes in the TME.Results The CD8 immunophenotype and associated gene expression patterns varied across indications as well as across primary and metastatic lesions. Melanoma and kidney cancers were among the strongest inflamed indications, while CD8-desert phenotypes were most abundant in liver metastases across all tumor types. A good correspondence between the transcriptome and the IHC-based evaluation enabled us to develop a 92-gene classifier that accurately predicted the IHC-based CD8 immunophenotype in primary and metastatic samples (area under the curve inflamed=0.846; excluded=0.712; desert=0.855). The newly developed classifier was prognostic in The Cancer Genome Atlas (TCGA) data and predictive in lung cancer: patients with predicted CD8-inflamed tumors showed prolonged overall survival (OS) versus patients with CD8-desert tumors (HR 0.88; 95% CI 0.80 to 0.97) across TCGA, and longer OS on immune checkpoint inhibitor administration (phase III OAK study) in non-small-cell lung cancer (HR 0.75; 95% CI 0.58 to 0.97).Conclusions We provide a new precision immunophenotyping tool based on gene expression that reflects the spatial infiltration patterns of CD8+ lymphocytes in tumors. The classifier enables multiplex analyses and is easy to apply for retrospective, reverse translation approaches as well as for prospective patient enrichment to optimize the response to cancer immunotherapy

    Energy barriers and driving forces in tRNA translocation through the ribosome

    No full text
    During protein synthesis, tRNAs move from the ribosome's aminoacyl to peptidyl to exit sites. Here we investigate conformational motions during spontaneous translocation, using molecular dynamics simulations of 13 intermediate-translocation-state models obtained by combining Escherichia coli ribosome crystal structures with cryo-EM data. Resolving fast transitions between states, we find that tRNA motions govern the transition rates within the pre- and post-translocation states. Intersubunit rotations and L1-stalk motion exhibit fast intrinsic submicrosecond dynamics. The L1 stalk drives the tRNA from the peptidyl site and links intersubunit rotation to translocation. Displacement of tRNAs is controlled by 'sliding' and 'stepping' mechanisms involving conserved L16, L5 and L1 residues, thus ensuring binding to the ribosome despite large-scale tRNA movement. Our results complement structural data with a time axis, intrinsic transition rates and molecular forces, revealing correlated functional motions inaccessible by other means
    corecore