387 research outputs found

    Simulation and analysis of in vitro DNA evolution

    Full text link
    We study theoretically the in vitro evolution of a DNA sequence by binding to a transcription factor. Using a simple model of protein-DNA binding and available binding constants for the Mnt protein, we perform large-scale, realistic simulations of evolution starting from a single DNA sequence. We identify different parameter regimes characterized by distinct evolutionary behaviors. For each regime we find analytical estimates which agree well with simulation results. For small population sizes, the DNA evolutional path is a random walk on a smooth landscape. While for large population sizes, the evolution dynamics can be well described by a mean-field theory. We also study how the details of the DNA-protein interaction affect the evolution.Comment: 11 pages, 11 figures. Submitted to PNA

    Exploring the DNA-recognition potential of homeodomains

    Get PDF
    The recognition potential of most families of DNA-binding domains (DBDs) remains relatively unexplored. Homeodomains (HDs), like many other families of DBDs, display limited diversity in their preferred recognition sequences. To explore the recognition potential of HDs, we utilized a bacterial selection system to isolate HD variants, from a randomized library, that are compatible with each of the 64 possible 3′ triplet sites (i.e., TAANNN). The majority of these selections yielded sets of HDs with overrepresented residues at specific recognition positions, implying the selection of specific binders. The DNA-binding specificity of 151 representative HD variants was subsequently characterized, identifying HDs that preferentially recognize 44 of these target sites. Many of these variants contain novel combinations of specificity determinants that are uncommon or absent in extant HDs. These novel determinants, when grafted into different HD backbones, produce a corresponding alteration in specificity. This information was used to create more explicit HD recognition models, which can inform the prediction of transcriptional regulatory networks for extant HDs or the engineering of HDs with novel DNA-recognition potential. The diversity of recovered HD recognition sequences raises important questions about the fitness barrier that restricts the evolution of alternate recognition modalities in natural systems

    Bayesian Centroid Estimation for Motif Discovery

    Get PDF
    Biological sequences may contain patterns that are signal important biomolecular functions; a classical example is regulation of gene expression by transcription factors that bind to specific patterns in genomic promoter regions. In motif discovery we are given a set of sequences that share a common motif and aim to identify not only the motif composition, but also the binding sites in each sequence of the set. We present a Bayesian model that is an extended version of the model adopted by the Gibbs motif sampler, and propose a new centroid estimator that arises from a refined and meaningful loss function for binding site inference. We discuss the main advantages of centroid estimation for motif discovery, including computational convenience, and how its principled derivation offers further insights about the posterior distribution of binding site configurations. We also illustrate, using simulated and real datasets, that the centroid estimator can differ from the maximum a posteriori estimator.Comment: 24 pages, 9 figure

    Predicting the binding preference of transcription factors to individual DNA k-mers

    Get PDF
    Motivation: Recognition of specific DNA sequences is a central mechanism by which transcription factors (TFs) control gene expression. Many TF-binding preferences, however, are unknown or poorly characterized, in part due to the difficulty associated with determining their specificity experimentally, and an incomplete understanding of the mechanisms governing sequence specificity. New techniques that estimate the affinity of TFs to all possible k-mers provide a new opportunity to study DNA–protein interaction mechanisms, and may facilitate inference of binding preferences for members of a given TF family when such information is available for other family members. Results: We employed a new dataset consisting of the relative preferences of mouse homeodomains for all eight-base DNA sequences in order to ask how well we can predict the binding profiles of homeodomains when only their protein sequences are given. We evaluated a panel of standard statistical inference techniques, as well as variations of the protein features considered. Nearest neighbour among functionally important residues emerged among the most effective methods. Our results underscore the complexity of TF–DNA recognition, and suggest a rational approach for future analyses of TF families. Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.Canadian Institutes of Health ResearchOntario Research FundNational Institutes of Health (U.S.)National Human Genome Research Institute (U.S.

    An improved predictive recognition model for Cys2-His2 zinc finger proteins

    Get PDF
    Cys(2)-His(2) zinc finger proteins (ZFPs) are the largest family of transcription factors in higher metazoans. They also represent the most diverse family with regards to the composition of their recognition sequences. Although there are a number of ZFPs with characterized DNA-binding preferences, the specificity of the vast majority of ZFPs is unknown and cannot be directly inferred by homology due to the diversity of recognition residues present within individual fingers. Given the large number of unique zinc fingers and assemblies present across eukaryotes, a comprehensive predictive recognition model that could accurately estimate the DNA-binding specificity of any ZFP based on its amino acid sequence would have great utility. Toward this goal, we have used the DNA-binding specificities of 678 two-finger modules from both natural and artificial sources to construct a random forest-based predictive model for ZFP recognition. We find that our recognition model outperforms previously described determinant-based recognition models for ZFPs, and can successfully estimate the specificity of naturally occurring ZFPs with previously defined specificities

    Formation of regulatory modules by local sequence duplication

    Get PDF
    Turnover of regulatory sequence and function is an important part of molecular evolution. But what are the modes of sequence evolution leading to rapid formation and loss of regulatory sites? Here, we show that a large fraction of neighboring transcription factor binding sites in the fly genome have formed from a common sequence origin by local duplications. This mode of evolution is found to produce regulatory information: duplications can seed new sites in the neighborhood of existing sites. Duplicate seeds evolve subsequently by point mutations, often towards binding a different factor than their ancestral neighbor sites. These results are based on a statistical analysis of 346 cis-regulatory modules in the Drosophila melanogaster genome, and a comparison set of intergenic regulatory sequence in Saccharomyces cerevisiae. In fly regulatory modules, pairs of binding sites show significantly enhanced sequence similarity up to distances of about 50 bp. We analyze these data in terms of an evolutionary model with two distinct modes of site formation: (i) evolution from independent sequence origin and (ii) divergent evolution following duplication of a common ancestor sequence. Our results suggest that pervasive formation of binding sites by local sequence duplications distinguishes the complex regulatory architecture of higher eukaryotes from the simpler architecture of unicellular organisms

    Guiding the Design of Synthetic DNA-Binding Molecules with Massively Parallel Sequencing

    Get PDF
    Genomic applications of DNA-binding molecules require an unbiased knowledge of their high affinity sites. We report the high-throughput analysis of pyrrole-imidazole polyamide DNA-binding specificity in a 10^(12)-member DNA sequence library using affinity purification coupled with massively parallel sequencing. We find that even within this broad context, the canonical pairing rules are remarkably predictive of polyamide DNA-binding specificity. However, this approach also allows identification of unanticipated high affinity DNA-binding sites in the reverse orientation for polyamides containing β/Im pairs. These insights allow the redesign of hairpin polyamides with different turn units capable of distinguishing 5′-WCGCGW-3′ from 5′-WGCGCW-3′. Overall, this study displays the power of high-throughput methods to aid the optimal targeting of sequence-specific minor groove binding molecules, an essential underpinning for biological and nanotechnological applications

    Sequence-selective detection of double-stranded DNA sequences using pyrrole-imidazole polyamide microarrays

    Get PDF
    We describe a microarray format that can detect double-stranded DNA sequences with a high degree of sequence selectivity. Cyclooctyne-derivatized pyrrole-imidazole polyamides were immobilized on azide-modified glass substrates using microcontact printing and a strain-promoted azide-alkyne cycloaddition (SPAAC) reaction. These polyamide-immobilized substrates selectively detected a seven-base-pair binding site incorporated within a double-stranded oligodeoxyribonucleotide sequence even in the presence of an excess of a sequence with a single-base-pair mismatc
    corecore