244 research outputs found

    Differential expression analysis for sequence count data

    Get PDF
    *Motivation:* High-throughput nucleotide sequencing provides quantitative readouts in assays for RNA expression (RNA-Seq), protein-DNA binding (ChIP-Seq) or cell counting (barcode sequencing). Statistical inference of differential signal in such data requires estimation of their variability throughout the dynamic range. When the number of replicates is small, error modelling is needed to achieve statistical power.

*Results:* We propose an error model that uses the negative binomial distribution, with variance and mean linked by local regression, to model the null distribution of the count data. The method controls type-I error and provides good detection power. 

*Availability:* A free open-source R software package, _DESeq_, is available from the Bioconductor project and from "http://www-huber.embl.de/users/anders/DESeq":http://www-huber.embl.de/users/anders/DESeq

    FRA2A is a CGG repeat expansion associated with silencing of AFF3

    Get PDF
    Folate-sensitive fragile sites (FSFS) are a rare cytogenetically visible subset of dynamic mutations. Of the eight molecularly characterized FSFS, four are associated with intellectual disability (ID). Cytogenetic expression results from CGG tri-nucleotide-repeat expansion mutation associated with local CpG hypermethylation and transcriptional silencing. The best studied is the FRAXA site in the FMR1 gene, where large expansions cause fragile X syndrome, the most common inherited ID syndrome. Here we studied three families with FRA2A expression at 2q11 associated with a wide spectrum of neurodevelopmental phenotypes. We identified a polymorphic CGG repeat in a conserved, brain-active alternative promoter of the AFF3 gene, an autosomal homolog of the X-linked AFF2/FMR2 gene: Expansion of the AFF2 CGG repeat causes FRAXE ID. We found that FRA2A-expressing individuals have mosaic expansions of the AFF3 CGG repeat in the range of several hundred repeat units. Moreover, bisulfite sequencing and pyrosequencing both suggest AFF3 promoter hypermethylation. cSNP-analysis demonstrates monoallelic expression of the AFF3 gene in FRA2A carriers thus predicting that FRA2A expression results in functional haploinsufficiency for AFF3 at least in a subset of tissues. By whole-mount in situ hybridization the mouse AFF3 ortholog shows strong regional expression in the developing brain, somites and limb buds in 9.5-12.5dpc mouse embryos. Our data suggest that there may be an association between FRA2A and a delay in the acquisition of motor and language skills in the families studied here. However, additional cases are required to firmly establish a causal relationship

    DGCR8 HITS-CLIP reveals novel functions for the Microprocessor

    Get PDF
    The Drosha-DGCR8 complex (Microprocessor) is required for microRNA (miRNA) biogenesis. DGCR8 recognizes the RNA substrate, whereas Drosha functions as the endonuclease. High-throughput sequencing and crosslinking immunoprecipitation (HITS-CLIP) was used to identify RNA targets of DGCR8 in human cells. Unexpectedly, miRNAs were not the most abundant targets. DGCR8-bound RNAs also comprised several hundred mRNAs as well as snoRNAs and long non-coding RNAs. We found that the Microprocessor controls the abundance of several mRNAs as well as of MALAT-1. By contrast, DGCR8-mediated cleavage of snoRNAs is independent of Drosha, suggesting the involvement of DGCR8 in cellular complexes with other endonucleases. Interestingly, binding of DGCR8 to cassette exons, acts as a novel mechanism to regulate the relative abundance of alternatively spliced isoforms. Collectively, these data provide new insights in the complex role of DGCR8 in controlling the fate of several classes of RNAs

    lincRNAs act in the circuitry controlling pluripotency and differentiation

    Get PDF
    Although thousands of large intergenic non-coding RNAs (lincRNAs) have been identified in mammals, few have been functionally characterized, leading to debate about their biological role. To address this, we performed loss-of-function studies on most lincRNAs expressed in mouse embryonic stem (ES) cells and characterized the effects on gene expression. Here we show that knockdown of lincRNAs has major consequences on gene expression patterns, comparable to knockdown of well-known ES cell regulators. Notably, lincRNAs primarily affect gene expression in trans. Knockdown of dozens of lincRNAs causes either exit from the pluripotent state or upregulation of lineage commitment programs. We integrate lincRNAs into the molecular circuitry of ES cells and show that lincRNA genes are regulated by key transcription factors and that lincRNA transcripts bind to multiple chromatin regulatory proteins to affect shared gene expression programs. Together, the results demonstrate that lincRNAs have key roles in the circuitry controlling ES cell state.Broad InstituteHarvard UniversityNational Human Genome Research Institute (U.S.)Merkin Family Foundation for Stem Cell Researc

    Genome-wide identification of Ago2 binding sites from mouse embryonic stem cells with and without mature microRNAs

    Get PDF
    MicroRNAs (miRNAs) are 19–22-nucleotide noncoding RNAs that post-transcriptionally regulate mRNA targets. We have identified endogenous miRNA binding sites in mouse embryonic stem cells (mESCs), by performing photo-cross-linking immunoprecipitation using antibodies to Argonaute (Ago2) followed by deep sequencing of RNAs (CLIP-seq). We also performed CLIP-seq in Dicer[superscript −/−] mESCs that lack mature miRNAs, allowing us to define whether the association of Ago2 with the identified sites was miRNA dependent. A significantly enriched motif, GCACUU, was identified only in wild-type mESCs in 3′ untranslated and coding regions. This motif matches the seed of a miRNA family that constitutes ~68% of the mESC miRNA population. Unexpectedly, a G-rich motif was enriched in sequences cross-linked to Ago2 in both the presence and absence of miRNAs. Expression analysis and reporter assays confirmed that the seed-related motif confers miRNA-directed regulation on host mRNAs and that the G-rich motif can modulate this regulation.Leukemia & Lymphoma Society of AmericaUnited States. Public Health Service (Grant R01-GM34277)United States. Public Health Service (Grant R01-CA133404)National Cancer Institute (U.S.) (Grant P01-CA42063)National Cancer Institute (U.S.) Cancer Center Support (Grant P30-CA14051

    Cytoplasmic Polyadenylation Element Binding Protein Deficiency Stimulates PTEN and Stat3 mRNA Translation and Induces Hepatic Insulin Resistance

    Get PDF
    The cytoplasmic polyadenylation element binding protein CPEB1 (CPEB) regulates germ cell development, synaptic plasticity, and cellular senescence. A microarray analysis of mRNAs regulated by CPEB unexpectedly showed that several encoded proteins are involved in insulin signaling. An investigation of Cpeb1 knockout mice revealed that the expression of two particular negative regulators of insulin action, PTEN and Stat3, were aberrantly increased. Insulin signaling to Akt was attenuated in livers of CPEB–deficient mice, suggesting that they might be defective in regulating glucose homeostasis. Indeed, when the Cpeb1 knockout mice were fed a high-fat diet, their livers became insulin-resistant. Analysis of HepG2 cells, a human liver cell line, depleted of CPEB demonstrated that this protein directly regulates the translation of PTEN and Stat3 mRNAs. Our results show that CPEB regulated translation is a key process involved in insulin signaling

    Predicting RNA-Protein Interactions Using Only Sequence Information

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>RNA-protein interactions (RPIs) play important roles in a wide variety of cellular processes, ranging from transcriptional and post-transcriptional regulation of gene expression to host defense against pathogens. High throughput experiments to identify RNA-protein interactions are beginning to provide valuable information about the complexity of RNA-protein interaction networks, but are expensive and time consuming. Hence, there is a need for reliable computational methods for predicting RNA-protein interactions.</p> <p>Results</p> <p>We propose <b><it>RPISeq</it></b>, a family of classifiers for predicting <b><it>R</it></b>NA-<b><it>p</it></b>rotein <b><it>i</it></b>nteractions using only <b><it>seq</it></b>uence information. Given the sequences of an RNA and a protein as input, <it>RPIseq </it>predicts whether or not the RNA-protein pair interact. The RNA sequence is encoded as a normalized vector of its ribonucleotide 4-mer composition, and the protein sequence is encoded as a normalized vector of its 3-mer composition, based on a 7-letter reduced alphabet representation. Two variants of <it>RPISeq </it>are presented: <it>RPISeq-SVM</it>, which uses a Support Vector Machine (SVM) classifier and <it>RPISeq-RF</it>, which uses a Random Forest classifier. On two non-redundant benchmark datasets extracted from the Protein-RNA Interface Database (PRIDB), <it>RPISeq </it>achieved an AUC (Area Under the Receiver Operating Characteristic (ROC) curve) of 0.96 and 0.92. On a third dataset containing only mRNA-protein interactions, the performance of <it>RPISeq </it>was competitive with that of a published method that requires information regarding many different features (e.g., mRNA half-life, GO annotations) of the putative RNA and protein partners. In addition, <it>RPISeq </it>classifiers trained using the PRIDB data correctly predicted the majority (57-99%) of non-coding RNA-protein interactions in NPInter-derived networks from <it>E. coli, S. cerevisiae, D. melanogaster, M. musculus</it>, and <it>H. sapiens</it>.</p> <p>Conclusions</p> <p>Our experiments with <it>RPISeq </it>demonstrate that RNA-protein interactions can be reliably predicted using only sequence-derived information. <it>RPISeq </it>offers an inexpensive method for computational construction of RNA-protein interaction networks, and should provide useful insights into the function of non-coding RNAs. <it>RPISeq </it>is freely available as a web-based server at <url>http://pridb.gdcb.iastate.edu/RPISeq/.</url></p

    Detection and Removal of Biases in the Analysis of Next-Generation Sequencing Reads

    Get PDF
    Since the emergence of next-generation sequencing (NGS) technologies, great effort has been put into the development of tools for analysis of the short reads. In parallel, knowledge is increasing regarding biases inherent in these technologies. Here we discuss four different biases we encountered while analyzing various Illumina datasets. These biases are due to both biological and statistical effects that in particular affect comparisons between different genomic regions. Specifically, we encountered biases pertaining to the distributions of nucleotides across sequencing cycles, to mappability, to contamination of pre-mRNA with mRNA, and to non-uniform hydrolysis of RNA. Most of these biases are not specific to one analyzed dataset, but are present across a variety of datasets and within a variety of genomic contexts. Importantly, some of these biases correlated in a highly significant manner with biological features, including transcript length, gene expression levels, conservation levels, and exon-intron architecture, misleadingly increasing the credibility of results due to them. We also demonstrate the relevance of these biases in the context of analyzing an NGS dataset mapping transcriptionally engaged RNA polymerase II (RNAPII) in the context of exon-intron architecture, and show that elimination of these biases is crucial for avoiding erroneous interpretation of the data. Collectively, our results highlight several important pitfalls, challenges and approaches in the analysis of NGS reads

    Mapping exosome-substrate interactions in vivo by UV cross-linking

    Get PDF
    International audienceAbstract The RNA exosome complex functions in both the accurate processing and rapid degradation of many classes of RNA in eukaryotes and Archaea. Functional and structural analyses indicate that RNA can either be threaded through the central channel of the exosome or more directly access the active sites of the ribonucleases Rrp44 and Rrp6, but in most cases, it remains unclear how many substrates follow each pathway in vivo. Here we describe the method for using an UV cross-linking technique termed CRAC to generate stringent, transcriptome-wide mapping of exosome–substrate interaction sites in vivo and at base-pair resolution

    Evolutionary Constraint Helps Unmask a Splicing Regulatory Region in BRCA1 Exon 11

    Get PDF
    BACKGROUND: Alternative splicing across exon 11 produces several BRCA1 isoforms. Their proportion varies during the cell cycle, between tissues and in cancer suggesting functional importance of BRCA1 splicing regulation around this exon. Although the regulatory elements driving exon 11 splicing have never been identified, a selective constraint against synonymous substitutions (silent nucleotide variations that do not alter the amino acid residue sequence) in a critical region of BRCA1 exon 11 has been reported to be associated with the necessity to maintain regulatory sequences. METHODOLOGY/PRINCIPAL FINDINGS: Here we have designed a specific minigene to investigate the possibility that this bias in synonymous codon usage reflects the need to preserve the BRCA1 alternative splicing program. We report that in-frame deletions and translationally silent nucleotide substitutions in the critical region affect splicing regulation of BRCA1 exon 11. CONCLUSIONS/SIGNIFICANCE: Using a hybrid minigene approach, we have experimentally validated the hypothesis that the need to maintain correct alternative splicing is a selective pressure against translationally silent sequence variations in the critical region of BRCA1 exon 11. Identification of the trans-acting factors involved in regulating exon 11 alternative splicing will be important in understanding BRCA1-associated tumorigenesis
    corecore