37 research outputs found

    ChIPpeakAnno: a Bioconductor package to annotate ChIP-seq and ChIP-chip data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Chromatin immunoprecipitation (ChIP) followed by high-throughput sequencing (ChIP-seq) or ChIP followed by genome tiling array analysis (ChIP-chip) have become standard technologies for genome-wide identification of DNA-binding protein target sites. A number of algorithms have been developed in parallel that allow identification of binding sites from ChIP-seq or ChIP-chip datasets and subsequent visualization in the University of California Santa Cruz (UCSC) Genome Browser as custom annotation tracks. However, summarizing these tracks can be a daunting task, particularly if there are a large number of binding sites or the binding sites are distributed widely across the genome.</p> <p>Results</p> <p>We have developed <it>ChIPpeakAnno </it>as a Bioconductor package within the statistical programming environment R to facilitate batch annotation of enriched peaks identified from ChIP-seq, ChIP-chip, cap analysis of gene expression (CAGE) or any experiments resulting in a large number of enriched genomic regions. The binding sites annotated with <it>ChIPpeakAnno </it>can be viewed easily as a table, a pie chart or plotted in histogram form, i.e., the distribution of distances to the nearest genes for each set of peaks. In addition, we have implemented functionalities for determining the significance of overlap between replicates or binding sites among transcription factors within a complex, and for drawing Venn diagrams to visualize the extent of the overlap between replicates. Furthermore, the package includes functionalities to retrieve sequences flanking putative binding sites for PCR amplification, cloning, or motif discovery, and to identify Gene Ontology (GO) terms associated with adjacent genes.</p> <p>Conclusions</p> <p><it>ChIPpeakAnno </it>enables batch annotation of the binding sites identified from ChIP-seq, ChIP-chip, CAGE or any technology that results in a large number of enriched genomic regions within the statistical programming environment R. Allowing users to pass their own annotation data such as a different Chromatin immunoprecipitation (ChIP) preparation and a dataset from literature, or existing annotation packages, such as <it>GenomicFeatures </it>and <it>BSgenom</it>e, provides flexibility. Tight integration to the <it>biomaRt </it>package enables up-to-date annotation retrieval from the BioMart database.</p

    Comparative transcriptome analysis reveals vertebrate phylotypic period during organogenesis

    Get PDF
    One of the central issues in evolutionary developmental biology is how we can formulate the relationships between evolutionary and developmental processes. Two major models have been proposed: the 'funnel-like' model, in which the earliest embryo shows the most conserved morphological pattern, followed by diversifying later stages, and the 'hourglass' model, in which constraints are imposed to conserve organogenesis stages, which is called the phylotypic period. Here we perform a quantitative comparative transcriptome analysis of several model vertebrate embryos and show that the pharyngula stage is most conserved, whereas earlier and later stages are rather divergent. These results allow us to predict approximate developmental timetables between different species, and indicate that pharyngula embryos have the most conserved gene expression profiles, which may be the source of the basic body plan of vertebrates

    Gene expression signatures of morphologically normal breast tissue identify basal-like tumors

    Get PDF
    INTRODUCTION: The role of the cellular microenvironment in breast tumorigenesis has become an important research area. However, little is known about gene expression in histologically normal tissue adjacent to breast tumor, if this is influenced by the tumor, and how this compares with non-tumor-bearing breast tissue. METHODS: To address this, we have generated gene expression profiles of morphologically normal epithelial and stromal tissue, isolated using laser capture microdissection, from patients with breast cancer or undergoing breast reduction mammoplasty (n = 44). RESULTS: Based on this data, we determined that morphologically normal epithelium and stroma exhibited distinct expression profiles, but molecular signatures that distinguished breast reduction tissue from tumor-adjacent normal tissue were absent. Stroma isolated from morphologically normal ducts adjacent to tumor tissue contained two distinct expression profiles that correlated with stromal cellularity, and shared similarities with soft tissue tumors with favorable outcome. Adjacent normal epithelium and stroma from breast cancer patients showed no significant association between expression profiles and standard clinical characteristics, but did cluster ER/PR/HER2-negative breast cancers with basal-like subtype expression profiles with poor prognosis. CONCLUSION: Our data reveal that morphologically normal tissue adjacent to breast carcinomas has not undergone significant gene expression changes when compared to breast reduction tissue, and provide an important gene expression dataset for comparative studies of tumor expression profiles

    Gene Expression Profiling during Early Acute Febrile Stage of Dengue Infection Can Predict the Disease Outcome

    Get PDF
    Background: We report the detailed development of biomarkers to predict the clinical outcome under dengue infection. Transcriptional signatures from purified peripheral blood mononuclear cells were derived from whole-genome gene-expression microarray data, validated by quantitative PCR and tested in independent samples. Methodology/Principal Findings: The study was performed on patients of a well-characterized dengue cohort from Recife, Brazil. The samples analyzed were collected prospectively from acute febrile dengue patients who evolved with different degrees of disease severity: classic dengue fever or dengue hemorrhagic fever (DHF) samples were compared with similar samples from other non-dengue febrile illnesses. The DHF samples were collected 2-3 days before the presentation of the plasma leakage symptoms. Differentially-expressed genes were selected by univariate statistical tests as well as multivariate classification techniques. The results showed that at early stages of dengue infection, the genes involved in effector mechanisms of innate immune response presented a weaker activation on patients who later developed hemorrhagic fever, whereas the genes involved in apoptosis were expressed in higher levels. Conclusions/Significance: Some of the gene expression signatures displayed estimated accuracy rates of more than 95%, indicating that expression profiling with these signatures may provide a useful means of DHF prognosis at early stages of infection. © 2009 Nascimento et al

    Meta-analysis of muscle transcriptome data using the MADMuscle database reveals biologically relevant gene patterns

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>DNA microarray technology has had a great impact on muscle research and microarray gene expression data has been widely used to identify gene signatures characteristic of the studied conditions. With the rapid accumulation of muscle microarray data, it is of great interest to understand how to compare and combine data across multiple studies. Meta-analysis of transcriptome data is a valuable method to achieve it. It enables to highlight conserved gene signatures between multiple independent studies. However, using it is made difficult by the diversity of the available data: different microarray platforms, different gene nomenclature, different species studied, etc.</p> <p>Description</p> <p>We have developed a system tool dedicated to muscle transcriptome data. This system comprises a collection of microarray data as well as a query tool. This latter allows the user to extract similar clusters of co-expressed genes from the database, using an input gene list. Common and relevant gene signatures can thus be searched more easily. The dedicated database consists in a large compendium of public data (more than 500 data sets) related to muscle (skeletal and heart). These studies included seven different animal species from invertebrates (<it>Drosophila melanogaster, Caenorhabditis elegans</it>) and vertebrates (<it>Homo sapiens, Mus musculus, Rattus norvegicus, Canis familiaris, Gallus gallus</it>). After a renormalization step, clusters of co-expressed genes were identified in each dataset. The lists of co-expressed genes were annotated using a unified re-annotation procedure. These gene lists were compared to find significant overlaps between studies.</p> <p>Conclusions</p> <p>Applied to this large compendium of data sets, meta-analyses demonstrated that conserved patterns between species could be identified. Focusing on a specific pathology (Duchenne Muscular Dystrophy) we validated results across independent studies and revealed robust biomarkers and new pathways of interest. The meta-analyses performed with MADMuscle show the usefulness of this approach. Our method can be applied to all public transcriptome data.</p

    Analysis of TaqMan Array Cards Data by an Assumption-Free Improvement of the maxRatio Algorithm Is More Accurate than the Cycle-Threshold Method

    Get PDF
    <div><p>Quantitative PCR diagnostic platforms are moving towards increased sample throughput, with instruments capable of carrying out thousands of reactions at once already in use. The need for a computational tool to reliably assist in the validation of the results is therefore compelling. In the present study, 328 residual clinical samples provided by the Public Health England at Addenbrooke's Hospital (Cambridge, UK) were processed by TaqMan Array Card assay, generating 15 744 reactions from 54 targets. The amplification data were analysed by the conventional cycle-threshold (CT) method and an improvement of the <i>maxRatio</i> (MR) algorithm developed to filter out the reactions with irregular amplification profiles. The reactions were also independently validated by three raters and a consensus was generated from their classification. The inter-rater agreement by Fleiss' kappa was 0.885; the agreement between either CT or MR with the raters gave Fleiss' kappa 0.884 and 0.902, respectively. Based on the consensus classification, the CT and MR methods achieved an assay accuracy of 0.979 and 0.987, respectively. These results suggested that the assumption-free MR algorithm was more reliable than the CT method, with clear advantages for the diagnostic settings.</p></div

    Development of a fish-based index to assess the eutrophication status of European lakes

    Get PDF
    The use of the CEN (European Committee for Standardization) standard method for sampling fish in lakes using multi-mesh gillnets allowed the collection of fish assemblages of 445 European lakes in 12 countries. The lakes were additionally characterised by environmental drivers and eutrophication proxies. Following a site-specific approach including a validation procedure, a fish index including two abundance metrics (catch per unit effort expressed as fish number and biomass) and one functional metric of composition (abundance of omnivorous fish) was developed. Correlated with the proxy of eutrophication, this index discriminates between heavily and moderately impacted lakes. Additional analyses on a subset of data from Nordic lakes revealed a stronger correlation between the new fish index and the pressure data. Despite an uneven geographical distribution of the lakes and certain shortcomings in the environmental and pressure data, the fish index proved to be useful for ecological status assessment of lakes applying standardised protocols and thus supports the development of national lake fish assessment tools in line with the European Water Framework Directive
    corecore