85 research outputs found

    Predicting enhancer regions and transcription factor binding sites in D. melanogaster

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2010.Cataloged from PDF version of thesis.Includes bibliographical references (p. 71-75).Identifying regions in the genome that have regulatory function is important to the fundamental biological problem of understanding the mechanisms through which a regulatory sequence drives specific spatial and temporal patterns of gene expression in early development. The modENCODE project aims to comprehensively identify functional elements in the C. elegans and D. melanogaster genomes. The genome- wide binding locations of all known transcription factors as well as of other DNA- binding proteins are currently being mapped within the context of this project [8]. The large quantity of new data that is becoming available through the modENCODE project and other experimental efforts offers the potential for gaining insight into the mechanisms of gene regulation. Developing improved approaches to identify functional regions and understand their architecture based on available experimental data represents a critical part of the modENCODE effort. Towards this goal, I use a machine learning approach to study the predictive power of experimental and sequence-based combinations of features for predicting enhancers and transcription factor binding sites.by Rachel Sealfon.S.M

    GOLEM: an interactive graph-based gene-ontology navigation and analysis tool

    Get PDF
    BACKGROUND: The Gene Ontology has become an extremely useful tool for the analysis of genomic data and structuring of biological knowledge. Several excellent software tools for navigating the gene ontology have been developed. However, no existing system provides an interactively expandable graph-based view of the gene ontology hierarchy. Furthermore, most existing tools are web-based or require an Internet connection, will not load local annotations files, and provide either analysis or visualization functionality, but not both. RESULTS: To address the above limitations, we have developed GOLEM (Gene Ontology Local Exploration Map), a visualization and analysis tool for focused exploration of the gene ontology graph. GOLEM allows the user to dynamically expand and focus the local graph structure of the gene ontology hierarchy in the neighborhood of any chosen term. It also supports rapid analysis of an input list of genes to find enriched gene ontology terms. The GOLEM application permits the user either to utilize local gene ontology and annotations files in the absence of an Internet connection, or to access the most recent ontology and annotation information from the gene ontology webpage. GOLEM supports global and organism-specific searches by gene ontology term name, gene ontology id and gene name. CONCLUSION: GOLEM is a useful software tool for biologists interested in visualizing the local directed acyclic graph structure of the gene ontology hierarchy and searching for gene ontology terms enriched in genes of interest. It is freely available both as an application and as an applet at

    A Cis-Regulatory Map of the Drosophila Genome

    Get PDF
    Systematic annotation of gene regulatory elements is a major challenge in genome science. Direct mapping of chromatin modification marks and transcriptional factor binding sites genome-wide1, 2 has successfully identified specific subtypes of regulatory elements3. In Drosophila several pioneering studies have provided genome-wide identification of Polycomb response elements4, chromatin states5, transcription factor binding sites6, 7, 8, 9, RNA polymerase II regulation8 and insulator elements10; however, comprehensive annotation of the regulatory genome remains a significant challenge. Here we describe results from the modENCODE cis-regulatory annotation project. We produced a map of the Drosophila melanogaster regulatory genome on the basis of more than 300 chromatin immunoprecipitation data sets for eight chromatin features, five histone deacetylases and thirty-eight site-specific transcription factors at different stages of development. Using these data we inferred more than 20,000 candidate regulatory elements and validated a subset of predictions for promoters, enhancers and insulators in vivo. We identified also nearly 2,000 genomic regions of dense transcription factor binding associated with chromatin activity and accessibility. We discovered hundreds of new transcription factor co-binding relationships and defined a transcription factor network with over 800 potential regulatory relationships

    Nomenclature- and Database-Compatible Names for the Two Ebola Virus Variants that Emerged in Guinea and the Democratic Republic of the Congo in 2014

    Get PDF
    In 2014, Ebola virus (EBOV) was identified as the etiological agent of a large and still expanding outbreak of Ebola virus disease (EVD) in West Africa and a much more confined EVD outbreak in Middle Africa. Epidemiological and evolutionary analyses confirmed that all cases of both outbreaks are connected to a single introduction each of EBOV into human populations and that both outbreaks are not directly connected. Coding-complete genomic sequence analyses of isolates revealed that the two outbreaks were caused by two novel EBOV variants, and initial clinical observations suggest that neither of them should be considered strains. Here we present consensus decisions on naming for both variants (West Africa: “Makona”, Middle Africa: “Lomela”) and provide database-compatible full, shortened, and abbreviated names that are in line with recently established filovirus sub-species nomenclatures

    Filovirus RefSeq Entries: Evaluation and Selection of Filovirus Type Variants, Type Sequences, and Names

    Get PDF
    Sequence determination of complete or coding-complete genomes of viruses is becoming common practice for supporting the work of epidemiologists, ecologists, virologists, and taxonomists. Sequencing duration and costs are rapidly decreasing, sequencing hardware is under modification for use by non-experts, and software is constantly being improved to simplify sequence data management and analysis. Thus, analysis of virus disease outbreaks on the molecular level is now feasible, including characterization of the evolution of individual virus populations in single patients over time. The increasing accumulation of sequencing data creates a management problem for the curators of commonly used sequence databases and an entry retrieval problem for end users. Therefore, utilizing the data to their fullest potential will require setting nomenclature and annotation standards for virus isolates and associated genomic sequences. The National Center for Biotechnology Information’s (NCBI’s) RefSeq is a non-redundant, curated database for reference (or type) nucleotide sequence records that supplies source data to numerous other databases. Building on recently proposed templates for filovirus variant naming [ ()////-], we report consensus decisions from a majority of past and currently active filovirus experts on the eight filovirus type variants and isolates to be represented in RefSeq, their final designations, and their associated sequences

    Double-stranded RNA drives SARS-CoV-2 nucleocapsid protein to undergo phase separation at specific temperatures

    Get PDF
    Nucleocapsid protein (N-protein) is required for multiple steps in betacoronaviruses replication. SARS-CoV-2-N-protein condenses with specific viral RNAs at particular temperatures making it a powerful model for deciphering RNA sequence specificity in condensates. We identify two separate and distinct double-stranded, RNA motifs (dsRNA stickers) that promote N-protein condensation. These dsRNA stickers are separately recognized by N-protein's two RNA binding domains (RBDs). RBD1 prefers structured RNA with sequences like the transcription-regulatory sequence (TRS). RBD2 prefers long stretches of dsRNA, independent of sequence. Thus, the two N-protein RBDs interact with distinct dsRNA stickers, and these interactions impart specific droplet physical properties that could support varied viral functions. Specifically, we find that addition of dsRNA lowers the condensation temperature dependent on RBD2 interactions and tunes translational repression. In contrast RBD1 sites are sequences critical for sub-genomic (sg) RNA generation and promote gRNA compression. The density of RBD1 binding motifs in proximity to TRS-L/B sequences is associated with levels of sub-genomic RNA generation. The switch to packaging is likely mediated by RBD1 interactions which generate particles that recapitulate the packaging unit of the virion. Thus, SARS-CoV-2 can achieve biochemical complexity, performing multiple functions in the same cytoplasm, with minimal protein components based on utilizing multiple distinct RNA motifs that control N-protein interactions

    First lensing measurements of SZ-discovered clusters

    Full text link
    We present the first lensing mass measurements of Sunyaev-Zel'dovich (SZ) selected clusters. Using optical imaging from the Southern Cosmology Survey (SCS), we present weak lensing masses for three clusters selected by their SZ emission in the South Pole Telescope survey (SPT). We confirm that the SZ selection procedure is successful in detecting mass concentrations. We also study the weak lensing signals from 38 optically-selected clusters in ~8 square degrees of the SCS survey. We fit Navarro, Frenk and White (NFW) profiles and find that the SZ clusters have amongst the largest masses, as high as 5x10^14 Msun. Using the best fit masses for all the clusters, we analytically calculate the expected SZ integrated Y parameter, which we find to be consistent with the SPT observations.Comment: Minor changes to match accepted version, 5 pages, 3 figure

    Identification of functional elements and regulatory circuits by Drosophila modENCODE

    Get PDF
    To gain insight into how genomic information is translated into cellular and developmental programs, the Drosophila model organism Encyclopedia of DNA Elements (modENCODE) project is comprehensively mapping transcripts, histone modifications, chromosomal proteins, transcription factors, replication proteins and intermediates, and nucleosome properties across a developmental time course and in multiple cell lines. We have generated more than 700 data sets and discovered protein-coding, noncoding, RNA regulatory, replication, and chromatin elements, more than tripling the annotated portion of the Drosophila genome. Correlated activity patterns of these elements reveal a functional regulatory network, which predicts putative new functions for genes, reveals stage- and tissue-specific regulators, and enables gene-expression prediction. Our results provide a foundation for directed experimental and computational studies in Drosophila and related species and also a model for systematic data integration toward comprehensive genomic and functional annotation

    Virus genomes reveal factors that spread and sustained the Ebola epidemic

    Get PDF
    The 2013-2016 West African epidemic caused by the Ebola virus was of unprecedented magnitude, duration and impact. Here we reconstruct the dispersal, proliferation and decline of Ebola virus throughout the region by analysing 1,610 Ebola virus genomes, which represent over 5% of the known cases. We test the association of geography, climate and demography with viral movement among administrative regions, inferring a classic 'gravity' model, with intense dispersal between larger and closer populations. Despite attenuation of international dispersal after border closures, cross-border transmission had already sown the seeds for an international epidemic, rendering these measures ineffective at curbing the epidemic. We address why the epidemic did not spread into neighbouring countries, showing that these countries were susceptible to substantial outbreaks but at lower risk of introductions. Finally, we reveal that this large epidemic was a heterogeneous and spatially dissociated collection of transmission clusters of varying size, duration and connectivity. These insights will help to inform interventions in future epidemics
    corecore