247 research outputs found
Back-translation for discovering distant protein homologies
Frameshift mutations in protein-coding DNA sequences produce a drastic change
in the resulting protein sequence, which prevents classic protein alignment
methods from revealing the proteins' common origin. Moreover, when a large
number of substitutions are additionally involved in the divergence, the
homology detection becomes difficult even at the DNA level. To cope with this
situation, we propose a novel method to infer distant homology relations of two
proteins, that accounts for frameshift and point mutations that may have
affected the coding sequences. We design a dynamic programming alignment
algorithm over memory-efficient graph representations of the complete set of
putative DNA sequences of each protein, with the goal of determining the two
putative DNA sequences which have the best scoring alignment under a powerful
scoring system designed to reflect the most probable evolutionary process. This
allows us to uncover evolutionary information that is not captured by
traditional alignment methods, which is confirmed by biologically significant
examples.Comment: The 9th International Workshop in Algorithms in Bioinformatics
(WABI), Philadelphia : \'Etats-Unis d'Am\'erique (2009
A Model-Based Analysis of GC-Biased Gene Conversion in the Human and Chimpanzee Genomes
GC-biased gene conversion (gBGC) is a recombination-associated process that favors the fixation of G/C alleles over A/T alleles. In mammals, gBGC is hypothesized to contribute to variation in GC content, rapidly evolving sequences, and the fixation of deleterious mutations, but its prevalence and general functional consequences remain poorly understood. gBGC is difficult to incorporate into models of molecular evolution and so far has primarily been studied using summary statistics from genomic comparisons. Here, we introduce a new probabilistic model that captures the joint effects of natural selection and gBGC on nucleotide substitution patterns, while allowing for correlations along the genome in these effects. We implemented our model in a computer program, called phastBias, that can accurately detect gBGC tracts about 1 kilobase or longer in simulated sequence alignments. When applied to real primate genome sequences, phastBias predicts gBGC tracts that cover roughly 0.3% of the human and chimpanzee genomes and account for 1.2% of human-chimpanzee nucleotide differences. These tracts fall in clusters, particularly in subtelomeric regions; they are enriched for recombination hotspots and fast-evolving sequences; and they display an ongoing fixation preference for G and C alleles. They are also significantly enriched for disease-associated polymorphisms, suggesting that they contribute to the fixation of deleterious alleles. The gBGC tracts provide a unique window into historical recombination processes along the human and chimpanzee lineages. They supply additional evidence of long-term conservation of megabase-scale recombination rates accompanied by rapid turnover of hotspots. Together, these findings shed new light on the evolutionary, functional, and disease implications of gBGC. The phastBias program and our predicted tracts are freely available. © 2013 Capra et al
Selective Constraints on Amino Acids Estimated by a Mechanistic Codon Substitution Model with Multiple Nucleotide Changes
Empirical substitution matrices represent the average tendencies of
substitutions over various protein families by sacrificing gene-level
resolution. We develop a codon-based model, in which mutational tendencies of
codon, a genetic code, and the strength of selective constraints against amino
acid replacements can be tailored to a given gene. First, selective constraints
averaged over proteins are estimated by maximizing the likelihood of each 1-PAM
matrix of empirical amino acid (JTT, WAG, and LG) and codon (KHG) substitution
matrices. Then, selective constraints specific to given proteins are
approximated as a linear function of those estimated from the empirical
substitution matrices.
Akaike information criterion (AIC) values indicate that a model allowing
multiple nucleotide changes fits the empirical substitution matrices
significantly better. Also, the ML estimates of transition-transversion bias
obtained from these empirical matrices are not so large as previously
estimated. The selective constraints are characteristic of proteins rather than
species. However, their relative strengths among amino acid pairs can be
approximated not to depend very much on protein families but amino acid pairs,
because the present model, in which selective constraints are approximated to
be a linear function of those estimated from the JTT/WAG/LG/KHG matrices, can
provide a good fit to other empirical substitution matrices including cpREV for
chloroplast proteins and mtREV for vertebrate mitochondrial proteins.
The present codon-based model with the ML estimates of selective constraints
and with adjustable mutation rates of nucleotide would be useful as a simple
substitution model in ML and Bayesian inferences of molecular phylogenetic
trees, and enables us to obtain biologically meaningful information at both
nucleotide and amino acid levels from codon and protein sequences.Comment: Table 9 in this article includes corrections for errata in the Table
9 published in 10.1371/journal.pone.0017244. Supporting information is
attached at the end of the article, and a computer-readable dataset of the ML
estimates of selective constraints is available from
10.1371/journal.pone.001724
PoPoolation: a toolbox for population genetic analysis of next generation sequencing data from pooled individuals
Recent statistical analyses suggest that sequencing of pooled samples provides a cost effective approach to determine genome-wide population genetic parameters. Here we introduce PoPoolation, a toolbox specifically designed for the population genetic analysis of sequence data from pooled individuals. PoPoolation calculates estimates of θWatterson, θπ, and Tajima's D that account for the bias introduced by pooling and sequencing errors, as well as divergence between species. Results of genome-wide analyses can be graphically displayed in a sliding window plot. PoPoolation is written in Perl and R and it builds on commonly used data formats. Its source code can be downloaded from http://code.google.com/p/popoolation/. Furthermore, we evaluate the influence of mapping algorithms, sequencing errors, and read coverage on the accuracy of population genetic parameter estimates from pooled data
Extensive and coordinated transcription of noncoding RNAs within cell-cycle promoters
Transcription of long noncoding RNAs (lncRNAs) within gene regulatory elements can modulate gene activity in response to external stimuli, but the scope and functions of such activity are not known. Here we use an ultrahigh-density array that tiles the promoters of 56 cell-cycle genes to interrogate 108 samples representing diverse perturbations. We identify 216 transcribed regions that encode putative lncRNAs, many with RT-PCR–validated periodic expression during the cell cycle, show altered expression in human cancers and are regulated in expression by specific oncogenic stimuli, stem cell differentiation or DNA damage. DNA damage induces five lncRNAs from the CDKN1A promoter, and one such lncRNA, named PANDA, is induced in a p53-dependent manner. PANDA interacts with the transcription factor NF-YA to limit expression of pro-apoptotic genes; PANDA depletion markedly sensitized human fibroblasts to apoptosis by doxorubicin. These findings suggest potentially widespread roles for promoter lncRNAs in cell-growth control.National Institutes of Health (U.S.)National Institute of Arthritis and Musculoskeletal and Skin Diseases (U.S.) (NIAMS) (K08-AR054615))National Cancer Institute (U.S.) (NIH/(NCI) (R01-CA118750))National Cancer Institute (U.S.) (NIH/(NCI) R01-CA130795))Juvenile Diabetes Research Foundation InternationalAmerican Cancer SocietyHoward Hughes Medical Institute (Early career scientist)Stanford University (Graduate Fellowship)National Science Foundation (U.S.) (Graduate Research Fellowship)United States. Dept. of Defense (National Defense Science and Engineering Graduate Fellowship
Vasorin-deficient mice display disturbed vitamin D and mineral homeostasis in combination with a low bone mass phenotype
Vasorin (Vasn) is a pleiotropic molecule involved in various physiological and pathological conditions, including cancer. Vasn has also been detected in bone cells of developing skeletal tissues but no function for Vasn in bone metabolism has been implicated yet. Therefore, this study aimed to investigate if Vasn plays a significant role in bone biology. First, we investigated tissue distribution of Vasn expression, using lacZ knock-in reporter mice. We detected clear Vasn expression in skeletal elements of postnatal mice. In particular, osteocytes and bone forming osteoblasts showed high expression of Vasn, while the bone marrow was devoid of signal. Vasn knockout mice (Vasn−/−) displayed postnatal growth retardation and died after four weeks. MicroCT analysis of femurs from 22- to 25-day-old Vasn−/− mice demonstrated reduced trabecular and cortical bone volume corresponding to a low bone mass phenotype. Ex vivo bone marrow cultures demonstrated that osteoclast differentiation and activity were not affected by Vasn deficiency. However, osteogenesis of Vasn−/− bone marrow cultures was disturbed, resulting in lower numbers of alkaline phosphate positive colonies, impaired mineralization and lower expression of osteoblast marker genes. In addition to the bone phenotype, these mice developed a vitamin D3-related phenotype with a strongly reduced circulating 25-hydroxyvitamin D3 and 1,25-dihydroxyvitamin D3 and urinary loss of vitamin D binding protein. In conclusion, Vasn-deficient mice suffer from severe disturbances in bone metabolism and mineral homeostasis.</p
webPRANK: a phylogeny-aware multiple sequence aligner with interactive alignment browser
<p>Abstract</p> <p>Background</p> <p>Phylogeny-aware progressive alignment has been found to perform well in phylogenetic alignment benchmarks and to produce superior alignments for the inference of selection on codon sequences. Its implementation in the PRANK alignment program package also allows modelling of complex evolutionary processes and inference of posterior probabilities for sequence sites evolving under each distinct scenario, either simultaneously with the alignment of sequences or as a post-processing step for an existing alignment. This has led to software with many advanced features, and users may find it difficult to generate optimal alignments, visualise the full information in their alignment results, or post-process these results, e.g. by objectively selecting subsets of alignment sites.</p> <p>Results</p> <p>We have created a web server called webPRANK that provides an easy-to-use interface to the PRANK phylogeny-aware alignment algorithm. The webPRANK server supports the alignment of DNA, protein and codon sequences as well as protein-translated alignment of cDNAs, and includes built-in structure models for the alignment of genomic sequences. The resulting alignments can be exported in various formats widely used in evolutionary sequence analyses. The webPRANK server also includes a powerful web-based alignment browser for the visualisation and post-processing of the results in the context of a cladogram relating the sequences, allowing (e.g.) removal of alignment columns with low posterior reliability. In addition to <it>de novo </it>alignments, webPRANK can be used for the inference of ancestral sequences with phylogenetically realistic gap patterns, and for the annotation and post-processing of existing alignments. The webPRANK server is freely available on the web at <url>http://tinyurl.com/webprank</url> .</p> <p>Conclusions</p> <p>The webPRANK server incorporates phylogeny-aware multiple sequence alignment, visualisation and post-processing in an easy-to-use web interface. It widens the user base of phylogeny-aware multiple sequence alignment and allows the performance of all alignment-related activity for small sequence analysis projects using only a standard web browser.</p
Correcting the Bias of Empirical Frequency Parameter Estimators in Codon Models
Markov models of codon substitution are powerful inferential tools for studying biological processes such as natural selection and preferences in amino acid substitution. The equilibrium character distributions of these models are almost always estimated using nucleotide frequencies observed in a sequence alignment, primarily as a matter of historical convention. In this note, we demonstrate that a popular class of such estimators are biased, and that this bias has an adverse effect on goodness of fit and estimates of substitution rates. We propose a “corrected” empirical estimator that begins with observed nucleotide counts, but accounts for the nucleotide composition of stop codons. We show via simulation that the corrected estimates outperform the de facto standard estimates not just by providing better estimates of the frequencies themselves, but also by leading to improved estimation of other parameters in the evolutionary models. On a curated collection of sequence alignments, our estimators show a significant improvement in goodness of fit compared to the approach. Maximum likelihood estimation of the frequency parameters appears to be warranted in many cases, albeit at a greater computational cost. Our results demonstrate that there is little justification, either statistical or computational, for continued use of the -style estimators
A model-independent approach to infer hierarchical codon substitution dynamics
<p>Abstract</p> <p>Background</p> <p>Codon substitution constitutes a fundamental process in molecular biology that has been studied extensively. However, prior studies rely on various assumptions, e.g. regarding the relevance of specific biochemical properties, or on conservation criteria for defining substitution groups. Ideally, one would instead like to analyze the substitution process in terms of raw dynamics, independently of underlying system specifics. In this paper we propose a method for doing this by identifying groups of codons and amino acids such that these groups imply closed dynamics. The approach relies on recently developed spectral and agglomerative techniques for identifying hierarchical organization in dynamical systems.</p> <p>Results</p> <p>We have applied the techniques on an empirically derived Markov model of the codon substitution process that is provided in the literature. Without system specific knowledge of the substitution process, the techniques manage to "blindly" identify multiple levels of dynamics; from amino acid substitutions (via the standard genetic code) to higher order dynamics on the level of amino acid groups. We hypothesize that the acquired groups reflect earlier versions of the genetic code.</p> <p>Conclusions</p> <p>The results demonstrate the applicability of the techniques. Due to their generality, we believe that they can be used to coarse grain and identify hierarchical organization in a broad range of other biological systems and processes, such as protein interaction networks, genetic regulatory networks and food webs.</p
- …
