51 research outputs found

    Cancer somatic mutations cluster in a subset of regulatory sites predicted from the ENCODE data

    Get PDF
    Background: Transcriptional regulation of gene expression is essential for cellular differentiation and function, and defects in the process are associated with cancer. The ENCODE project has mapped potential regulatory sites across the complete genome in many cell types, and these regions have been shown to harbour many of the somatic mutations that occur in cancer cells, suggesting that their effects may drive cancer initiation and development. The ENCODE data suggests a very large number of regulatory sites, and methods are needed to identify those that are most relevant and to connect them to the genes that they control. Methods: Predictive models of gene expression were developed by integrating the ENCODE data for regulation, including transcription factor binding and DNase1 hypersensitivity, with RNA-seq data for gene expression. A penalized regression method was used to identify the most predictive potential regulatory sites for each transcript. Known cancer somatic mutations from the COSMIC database were mapped to potential regulatory sites, and we examined differences in the mapping frequencies associated with sites chosen in regulatory models and other (rejected) sites. The effects of potential confounders, for example replication timing, were considered. Results: Cancer somatic mutations preferentially occupy those regulatory regions chosen in our models as most predictive of gene expression. Conclusion: Our methods have identified a significantly reduced set of regulatory sites that are enriched in cancer somatic mutations and are more predictive of gene expression. This has significance for the mechanistic interpretation of cancer mutations, and the understanding of genetic regulation

    Circulating microRNAs as novel biomarkers for diabetes mellitus.

    Get PDF
    Diabetes mellitus is characterized by insulin secretion from pancreatic β cells that is insufficient to maintain blood glucose homeostasis. Autoimmune destruction of β cells results in type 1 diabetes mellitus, whereas conditions that reduce insulin sensitivity and negatively affect β-cell activities result in type 2 diabetes mellitus. Without proper management, patients with diabetes mellitus develop serious complications that reduce their quality of life and life expectancy. Biomarkers for early detection of the disease and identification of individuals at risk of developing complications would greatly improve the care of these patients. Small non-coding RNAs called microRNAs (miRNAs) control gene expression and participate in many physiopathological processes. Hundreds of miRNAs are actively or passively released in the circulation and can be used to evaluate health status and disease progression. Both type 1 diabetes mellitus and type 2 diabetes mellitus are associated with distinct modifications in the profile of miRNAs in the blood, which are sometimes detectable several years before the disease manifests. Moreover, circulating levels of certain miRNAs seem to be predictive of long-term complications. Technical and scientific obstacles still exist that need to be overcome, but circulating miRNAs might soon become part of the diagnostic arsenal to identify individuals at risk of developing diabetes mellitus and its devastating complications

    A Novel Bayesian DNA Motif Comparison Method for Clustering and Retrieval

    Get PDF
    Characterizing the DNA-binding specificities of transcription factors is a key problem in computational biology that has been addressed by multiple algorithms. These usually take as input sequences that are putatively bound by the same factor and output one or more DNA motifs. A common practice is to apply several such algorithms simultaneously to improve coverage at the price of redundancy. In interpreting such results, two tasks are crucial: clustering of redundant motifs, and attributing the motifs to transcription factors by retrieval of similar motifs from previously characterized motif libraries. Both tasks inherently involve motif comparison. Here we present a novel method for comparing and merging motifs, based on Bayesian probabilistic principles. This method takes into account both the similarity in positional nucleotide distributions of the two motifs and their dissimilarity to the background distribution. We demonstrate the use of the new comparison method as a basis for motif clustering and retrieval procedures, and compare it to several commonly used alternatives. Our results show that the new method outperforms other available methods in accuracy and sensitivity. We incorporated the resulting motif clustering and retrieval procedures in a large-scale automated pipeline for analyzing DNA motifs. This pipeline integrates the results of various DNA motif discovery algorithms and automatically merges redundant motifs from multiple training sets into a coherent annotated library of motifs. Application of this pipeline to recent genome-wide transcription factor location data in S. cerevisiae successfully identified DNA motifs in a manner that is as good as semi-automated analysis reported in the literature. Moreover, we show how this analysis elucidates the mechanisms of condition-specific preferences of transcription factors

    Systematic identification of yeast cell cycle transcription factors using multiple data sources

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Eukaryotic cell cycle is a complex process and is precisely regulated at many levels. Many genes specific to the cell cycle are regulated transcriptionally and are expressed just before they are needed. To understand the cell cycle process, it is important to identify the cell cycle transcription factors (TFs) that regulate the expression of cell cycle-regulated genes.</p> <p>Results</p> <p>We developed a method to identify cell cycle TFs in yeast by integrating current ChIP-chip, mutant, transcription factor binding site (TFBS), and cell cycle gene expression data. We identified 17 cell cycle TFs, 12 of which are known cell cycle TFs, while the remaining five (Ash1, Rlm1, Ste12, Stp1, Tec1) are putative novel cell cycle TFs. For each cell cycle TF, we assigned specific cell cycle phases in which the TF functions and identified the time lag for the TF to exert regulatory effects on its target genes. We also identified 178 novel cell cycle-regulated genes, among which 59 have unknown functions, but they may now be annotated as cell cycle-regulated genes. Most of our predictions are supported by previous experimental or computational studies. Furthermore, a high confidence TF-gene regulatory matrix is derived as a byproduct of our method. Each TF-gene regulatory relationship in this matrix is supported by at least three data sources: gene expression, TFBS, and ChIP-chip or/and mutant data. We show that our method performs better than four existing methods for identifying yeast cell cycle TFs. Finally, an application of our method to different cell cycle gene expression datasets suggests that our method is robust.</p> <p>Conclusion</p> <p>Our method is effective for identifying yeast cell cycle TFs and cell cycle-regulated genes. Many of our predictions are validated by the literature. Our study shows that integrating multiple data sources is a powerful approach to studying complex biological systems.</p

    Embedding mRNA Stability in Correlation Analysis of Time-Series Gene Expression Data

    Get PDF
    Current methods for the identification of putatively co-regulated genes directly from gene expression time profiles are based on the similarity of the time profile. Such association metrics, despite their central role in gene network inference and machine learning, have largely ignored the impact of dynamics or variation in mRNA stability. Here we introduce a simple, but powerful, new similarity metric called lead-lag R2 that successfully accounts for the properties of gene dynamics, including varying mRNA degradation and delays. Using yeast cell-cycle time-series gene expression data, we demonstrate that the predictive power of lead-lag R2 for the identification of co-regulated genes is significantly higher than that of standard similarity measures, thus allowing the selection of a large number of entirely new putatively co-regulated genes. Furthermore, the lead-lag metric can also be used to uncover the relationship between gene expression time-series and the dynamics of formation of multiple protein complexes. Remarkably, we found a high lead-lag R2 value among genes coding for a transient complex

    A Feature-Based Approach to Modeling Protein–DNA Interactions

    Get PDF
    Transcription factor (TF) binding to its DNA target site is a fundamental regulatory interaction. The most common model used to represent TF binding specificities is a position specific scoring matrix (PSSM), which assumes independence between binding positions. However, in many cases, this simplifying assumption does not hold. Here, we present feature motif models (FMMs), a novel probabilistic method for modeling TF–DNA interactions, based on log-linear models. Our approach uses sequence features to represent TF binding specificities, where each feature may span multiple positions. We develop the mathematical formulation of our model and devise an algorithm for learning its structural features from binding site data. We also developed a discriminative motif finder, which discovers de novo FMMs that are enriched in target sets of sequences compared to background sets. We evaluate our approach on synthetic data and on the widely used TF chromatin immunoprecipitation (ChIP) dataset of Harbison et al. We then apply our algorithm to high-throughput TF ChIP data from mouse and human, reveal sequence features that are present in the binding specificities of mouse and human TFs, and show that FMMs explain TF binding significantly better than PSSMs. Our FMM learning and motif finder software are available at http://genie.weizmann.ac.il/

    Le Village suisse comme modèle d'urbanisme

    Get PDF
    This chapter introduces systems biology, its context, aims, concepts and strategies. It then describes approaches and methods used for collection of high-dimensional structural and functional genomics data, including epigenomics, transcriptomics, proteomics, metabolomics and lipidomics, and how recent technological advances in these fields have moved the bottleneck from data production to data analysis and bioinformatics. Finally, the most advanced mathematical and computational methods used for clustering, feature selection, prediction analysis, text mining and pathway analysis in functional genomics and systems biology are reviewed and discussed in the context of use cases

    Eliminating hepatitis C in Australia: a novel model of hepatitis C testing and treatment for people who inject drugs at a medically supervised injecting facility

    No full text
    OBJECTIVE: To evaluate the feasibility of testing and treating people who inject drugs at a supervised injecting facility for hepatitis C virus (HCV) infection. DESIGN: Retrospective cohort study. SETTING, PARTICIPANTS: People who inject drugs who attended the Melbourne supervised injecting facility, 30 June 2018 - 30 June 2020. MAIN OUTCOME MEASURES: Proportion of people tested for hepatitis C; proportions of people positive for anti-HCV antibody and HCV RNA, and of eligible people prescribed direct-acting antiviral (DAA) treatment; sustained virological response twelve weeks or more after treatment completion. RESULTS: Of 4649 people who attended the supervised injecting facility during 2018-20, 321 were tested for hepatitis C (7%); 279 were anti-HCV antibody-positive (87%), of whom 143 (51%) were also HCV RNA-positive. Sixty-four of 321 had previously been treated for hepatitis C (20%), 21 had clinically identified cirrhosis (7%), eight had hepatitis B infections (2%), and four had human immunodeficiency virus infections (1%). In multivariate analyses, people tested for hepatitis C were more likely than untested clients to report psychiatric illness (adjusted odds ratio [aOR], 9.65; 95% confidence interval [CI], 7.26-12.8), not have a fixed address (aOR, 1.59; 95% CI, 1.18-2.14), and to report significant alcohol use (aOR, 1.57; 95% CI, 1.06-2.32). The median number of injecting facility visits was larger for those tested for hepatitis C (101; interquartile range [IQR], 31-236) than for those not tested (20; IQR, 3-90). DAA treatment was prescribed for 126 of 143 HCV RNA-positive clients (88%); 41 of 54 with complete follow-up data were cured (76%). CONCLUSIONS: People who attend supervised injecting facilities can be tested and treated for hepatitis C on site. Models that provide streamlined, convenient hepatitis C care promote engagement with treatment in a group in which the prevalence of hepatitis C is high
    corecore