773 research outputs found
Motif Discovery through Predictive Modeling of Gene Regulation
We present MEDUSA, an integrative method for learning motif models of
transcription factor binding sites by incorporating promoter sequence and gene
expression data. We use a modern large-margin machine learning approach, based
on boosting, to enable feature selection from the high-dimensional search space
of candidate binding sequences while avoiding overfitting. At each iteration of
the algorithm, MEDUSA builds a motif model whose presence in the promoter
region of a gene, coupled with activity of a regulator in an experiment, is
predictive of differential expression. In this way, we learn motifs that are
functional and predictive of regulatory response rather than motifs that are
simply overrepresented in promoter sequences. Moreover, MEDUSA produces a model
of the transcriptional control logic that can predict the expression of any
gene in the organism, given the sequence of the promoter region of the target
gene and the expression state of a set of known or putative transcription
factors and signaling molecules. Each motif model is either a -length
sequence, a dimer, or a PSSM that is built by agglomerative probabilistic
clustering of sequences with similar boosting loss. By applying MEDUSA to a set
of environmental stress response expression data in yeast, we learn motifs
whose ability to predict differential expression of target genes outperforms
motifs from the TRANSFAC dataset and from a previously published candidate set
of PSSMs. We also show that MEDUSA retrieves many experimentally confirmed
binding sites associated with environmental stress response from the
literature.Comment: RECOMB 200
An intuitionistic approach to scoring DNA sequences against transcription factor binding site motifs
Background: Transcription factors (TFs) control transcription by binding to specific regions of DNA called transcription factor binding sites (TFBSs). The identification of TFBSs is a crucial problem in computational biology and includes the subtask of predicting the location of known TFBS motifs in a given DNA sequence. It has previously been shown that, when scoring matches to known TFBS motifs, interdependencies between positions within a motif should be taken into account. However, this remains a challenging task owing to the fact that sequences similar to those of known TFBSs can occur by chance with a relatively high frequency. Here we present a new method for matching sequences to TFBS motifs based on intuitionistic fuzzy sets (IFS) theory, an approach that has been shown to be particularly appropriate for tackling problems that embody a high degree of uncertainty.
Results: We propose SCintuit, a new scoring method for measuring sequence-motif affinity based on IFS theory. Unlike existing methods that consider dependencies between positions, SCintuit is designed to prevent overestimation of less conserved positions of TFBSs. For a given pair of bases, SCintuit is computed not only as a function of their combined probability of occurrence, but also taking into account the individual importance of each single base at its corresponding position. We used SCintuit to identify known TFBSs in DNA sequences. Our method provides excellent results when dealing with both synthetic and real data, outperforming the sensitivity and the specificity of two existing methods in all the experiments we performed.
Conclusions: The results show that SCintuit improves the prediction quality for TFs of the existing approaches without compromising sensitivity. In addition, we show how SCintuit can be successfully applied to real research problems. In this study the reliability of the IFS theory for motif discovery tasks is proven
Quantitative model for inferring dynamic regulation of the tumour suppressor gene p53
Background: The availability of various "omics" datasets creates a prospect of performing the study of genome-wide genetic regulatory networks. However, one of the major challenges of using mathematical models to infer genetic regulation from microarray datasets is the lack of information for protein concentrations and activities. Most of the previous researches were based on an assumption that the mRNA levels of a gene are consistent with its protein activities, though it is not always the case. Therefore, a more sophisticated modelling framework together with the corresponding inference methods is needed to accurately estimate genetic regulation from "omics" datasets.
Results: This work developed a novel approach, which is based on a nonlinear mathematical model, to infer genetic regulation from microarray gene expression data. By using the p53 network as a test system, we used the nonlinear model to estimate the activities of transcription factor (TF) p53 from the expression levels of its target genes, and to identify the activation/inhibition status of p53 to its target genes. The predicted top 317 putative p53 target genes were supported by DNA sequence analysis. A comparison between our prediction and the other published predictions of p53 targets suggests that most of putative p53 targets may share a common depleted or enriched sequence signal on their upstream non-coding region.
Conclusions: The proposed quantitative model can not only be used to infer the regulatory relationship between TF and its down-stream genes, but also be applied to estimate the protein activities of TF from the expression levels of its target genes
Application of regulatory sequence analysis and metabolic network analysis to the interpretation of gene expression data
We present two complementary approaches for the interpretation of clusters of
co-regulated genes, such as those obtained from DNA chips and related methods.
Starting from a cluster of genes with similar expression profiles, two basic
questions can be asked:
1. Which mechanism is responsible for the coordinated transcriptional response
of the genes? This question is approached by extracting motifs that are shared
between the upstream sequences of these genes. The motifs extracted are putative
cis-acting regulatory elements.
2. What is the physiological meaning for the cell to express together these
genes? One way to answer the question is to search for potential metabolic
pathways that could be catalyzed by the products of the genes. This can be
done by selecting the genes from the cluster that code for enzymes, and trying
to assemble the catalyzed reactions to form metabolic pathways.
We present tools to answer these two questions, and we illustrate their use with
selected examples in the yeast Saccharomyces cerevisiae. The tools are available
on the web (http://ucmb.ulb.ac.be/bioinformatics/rsa-tools/;
http://www.ebi.ac.uk/research/pfbp/; http://www.soi.city.ac.uk/~msch/)
Features of mammalian microRNA promoters emerge from polymerase II chromatin immunoprecipitation data
Background: MicroRNAs (miRNAs) are short, non-coding RNA regulators of protein coding genes. miRNAs play a very important role in diverse biological processes and various diseases. Many algorithms are able to predict miRNA genes and their targets, but their transcription regulation is still under investigation. It is generally believed that intragenic miRNAs (located in introns or exons of protein coding genes) are co-transcribed with their host genes and most intergenic miRNAs transcribed from their own RNA polymerase II (Pol II) promoter. However, the length of the primary transcripts and promoter organization is currently unknown. Methodology: We performed Pol II chromatin immunoprecipitation (ChIP)-chip using a custom array surrounding regions of known miRNA genes. To identify the true core transcription start sites of the miRNA genes we developed a new tool (CPPP). We showed that miRNA genes can be transcribed from promoters located several kilobases away and that their promoters share the same general features as those of protein coding genes. Finally, we found evidence that as many as 26% of the intragenic miRNAs may be transcribed from their own unique promoters. Conclusion: miRNA promoters have similar features to those of protein coding genes, but miRNA transcript organization is more complex. © 2009 Corcoran et al
EndoNet: an information resource about regulatory networks of cell-to-cell communication†
EndoNet is an information resource about intercellular regulatory communication. It provides information about hormones, hormone receptors, the sources (i.e. cells, tissues and organs) where the hormones are synthesized and secreted, and where the respective receptors are expressed. The database focuses on the regulatory relations between them. An elementary communication is displayed as a causal link from a cell that secretes a particular hormone to those cells which express the corresponding hormone receptor and respond to the hormone. Whenever expression, synthesis and/or secretion of another hormone are part of this response, it renders the corresponding cell an internal node of the resulting network. This intercellular communication network coordinates the function of different organs. Therefore, the database covers the hierarchy of cellular organization of tissues and organs as it has been modeled in the Cytomer ontology, which has now been directly embedded into EndoNet. The user can query the database; the results can be used to visualize the intercellular information flow. A newly implemented hormone classification enables to browse the database and may be used as alternative entry point. EndoNet is accessible at: http://endonet.bioinf.med.uni-goettingen.de
Genetically Engineered Alginate Lyase-PEG Conjugates Exhibit Enhanced Catalytic Function and Reduced Immunoreactivity
Alginate lyase enzymes represent prospective biotherapeutic agents for treating bacterial infections, particularly in the cystic fibrosis airway. To effectively deimmunize one therapeutic candidate while maintaining high level catalytic proficiency, a combined genetic engineering-PEGylation strategy was implemented. Rationally designed, site-specific PEGylation variants were constructed by orthogonal maleimide-thiol coupling chemistry. In contrast to random PEGylation of the enzyme by NHS-ester mediated chemistry, controlled mono-PEGylation of A1-III alginate lyase produced a conjugate that maintained wild type levels of activity towards a model substrate. Significantly, the PEGylated variant exhibited enhanced solution phase kinetics with bacterial alginate, the ultimate therapeutic target. The immunoreactivity of the PEGylated enzyme was compared to a wild type control using in vitro binding studies with both enzyme-specific antibodies, from immunized New Zealand white rabbits, and a single chain antibody library, derived from a human volunteer. In both cases, the PEGylated enzyme was found to be substantially less immunoreactive. Underscoring the enzyme's potential for practical utility, >90% of adherent, mucoid, Pseudomonas aeruginosa biofilms were removed from abiotic surfaces following a one hour treatment with the PEGylated variant, whereas the wild type enzyme removed only 75% of biofilms in parallel studies. In aggregate, these results demonstrate that site-specific mono-PEGylation of genetically engineered A1-III alginate lyase yielded an enzyme with enhanced performance relative to therapeutically relevant metrics.Cystic Fibrosis Foundation (Research Development Program)National Center for Research Resources (U.S.) (P20RR018787-06
Occupancy maps of 208 chromatin-associated proteins in one human cell type
Transcription factors are DNA-binding proteins that have key roles in gene regulation. Genome-wide occupancy maps of transcriptional regulators are important for understanding gene regulation and its effects on diverse biological processes. However, only a minority of the more than 1,600 transcription factors encoded in the human genome has been assayed. Here we present, as part of the ENCODE (Encyclopedia of DNA Elements) project, data and analyses from chromatin immunoprecipitation followed by high-throughput sequencing (ChIP–seq) experiments using the human HepG2 cell line for 208 chromatin-associated proteins (CAPs). These comprise 171 transcription factors and 37 transcriptional cofactors and chromatin regulator proteins, and represent nearly one-quarter of CAPs expressed in HepG2 cells. The binding profiles of these CAPs form major groups associated predominantly with promoters or enhancers, or with both. We confirm and expand the current catalogue of DNA sequence motifs for transcription factors, and describe motifs that correspond to other transcription factors that are co-enriched with the primary ChIP target. For example, FOX family motifs are enriched in ChIP–seq peaks of 37 other CAPs. We show that motif content and occupancy patterns can distinguish between promoters and enhancers. This catalogue reveals high-occupancy target regions at which many CAPs associate, although each contains motifs for only a minority of the numerous associated transcription factors. These analyses provide a more complete overview of the gene regulatory networks that define this cell type, and demonstrate the usefulness of the large-scale production efforts of the ENCODE Consortium
- …
