19 research outputs found

    The endogenous retrovirus ENS-1 provides active binding sites for transcription factors in embryonic stem cells that specify extra embryonic tissue

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Long terminal repeats (LTR) from endogenous retroviruses (ERV) are source of binding sites for transcription factors which affect the host regulatory networks in different cell types, including pluripotent cells. The embryonic epiblast is made of pluripotent cells that are subjected to opposite transcriptional regulatory networks to give rise to distinct embryonic and extraembryonic lineages. To assess the transcriptional contribution of ERV to early developmental processes, we have characterized <it>in vitro </it>and <it>in vivo </it>the regulation of ENS-1, a host adopted and developmentally regulated ERV that is expressed in chick embryonic stem cells.</p> <p>Results</p> <p>We show that <it>Ens-1 </it>LTR activity is controlled by two transcriptional pathways that drive pluripotent cells to alternative developmental fates. Indeed, both Nanog that maintains pluripotency and Gata4 that induces differentiation toward extraembryonic endoderm independently activate the LTR. Ets coactivators are required to support Gata factors' activity thus preventing inappropriate activation before epigenetic silencing occurs during differentiation. Consistent with their expression patterns during chick embryonic development, Gata4, Nanog and Ets1 are recruited on the LTR in embryonic stem cells; in the epiblast the complementary expression of Nanog and Gata/Ets correlates with the <it>Ens-1 </it>gene expression pattern; and Ens-1 transcripts are also detected in the hypoblast, an extraembryonic tissue expressing Gata4 and Ets2, but not Nanog. Accordingly, over expression of Gata4 in embryos induces an ectopic expression of <it>Ens-1</it>.</p> <p>Conclusion</p> <p>Our results show that <it>Ens-1 </it>LTR have co-opted conditions required for the emergence of extraembryonic tissues from pluripotent epiblasts cells. By providing pluripotent cells with intact binding sites for Gata, Nanog, or both, <it>Ens-1 </it>LTR may promote distinct transcriptional networks in embryonic stem cells subpopulations and prime the separation between embryonic and extraembryonic fates.</p

    Pan-cancer analysis of whole genomes

    Get PDF
    Cancer is driven by genetic change, and the advent of massively parallel sequencing has enabled systematic documentation of this variation at the whole-genome scale(1-3). Here we report the integrative analysis of 2,658 whole-cancer genomes and their matching normal tissues across 38 tumour types from the Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium of the International Cancer Genome Consortium (ICGC) and The Cancer Genome Atlas (TCGA). We describe the generation of the PCAWG resource, facilitated by international data sharing using compute clouds. On average, cancer genomes contained 4-5 driver mutations when combining coding and non-coding genomic elements; however, in around 5% of cases no drivers were identified, suggesting that cancer driver discovery is not yet complete. Chromothripsis, in which many clustered structural variants arise in a single catastrophic event, is frequently an early event in tumour evolution; in acral melanoma, for example, these events precede most somatic point mutations and affect several cancer-associated genes simultaneously. Cancers with abnormal telomere maintenance often originate from tissues with low replicative activity and show several mechanisms of preventing telomere attrition to critical levels. Common and rare germline variants affect patterns of somatic mutation, including point mutations, structural variants and somatic retrotransposition. A collection of papers from the PCAWG Consortium describes non-coding mutations that drive cancer beyond those in the TERT promoter(4); identifies new signatures of mutational processes that cause base substitutions, small insertions and deletions and structural variation(5,6); analyses timings and patterns of tumour evolution(7); describes the diverse transcriptional consequences of somatic mutation on splicing, expression levels, fusion genes and promoter activity(8,9); and evaluates a range of more-specialized features of cancer genomes(8,10-18).Peer reviewe

    Germ Cell-Specific Targeting of DICER or DGCR8 Reveals a Novel Role for Endo-siRNAs in the Progression of Mammalian Spermatogenesis and Male Fertility

    Get PDF
    Small non-coding RNAs act as critical regulators of gene expression and are essential for male germ cell development and spermatogenesis. Previously, we showed that germ cell-specific inactivation of Dicer1, an endonuclease essential for the biogenesis of micro-RNAs (miRNAs) and endogenous small interfering RNAs (endo-siRNAs), led to complete male infertility due to alterations in meiotic progression, increased spermatocyte apoptosis and defects in the maturation of spermatozoa. To dissect the distinct physiological roles of miRNAs and endo-siRNAs in spermatogenesis, we compared the testicular phenotype of mice with Dicer1 or Dgcr8 depletion in male germ cells. Dgcr8 mutant mice, which have a defective miRNA pathway while retaining an intact endo-siRNA pathway, were also infertile and displayed similar defects, although less severe, to Dicer1 mutant mice. These included cumulative defects in meiotic and haploid phases of spermatogenesis, resulting in oligo-, terato-, and azoospermia. In addition, we found by RNA sequencing of purified spermatocytes that inactivation of Dicer1 and the resulting absence of miRNAs affected the fine tuning of protein-coding gene expression by increasing low level gene expression. Overall, these results emphasize the essential role of miRNAs in the progression of spermatogenesis, but also indicate a role for endo-siRNAs in this process

    Pan-conserved segment tags identify ultra-conserved sequences across assemblies in the human pangenome

    Get PDF
    The human pangenome, a new reference sequence, addresses many limitations of the current GRCh38 reference. The first release is based on 94 high-quality haploid assemblies from individuals with diverse backgrounds. We employed a k-mer indexing strategy for comparative analysis across multiple assemblies, including the pangenome reference, GRCh38, and CHM13, a telomere-to-telomere reference assembly. Our k-mer indexing approach enabled us to identify a valuable collection of universally conserved sequences across all assemblies, referred to as “pan-conserved segment tags” (PSTs). By examining intervals between these segments, we discerned highly conserved genomic segments and those with structurally related polymorphisms. We found 60,764 polymorphic intervals with unique geo-ethnic features in the pangenome reference. In this study, we utilized ultra-conserved sequences (PSTs) to forge a link between human pangenome assemblies and reference genomes. This methodology enables the examination of any sequence of interest within the pangenome, using the reference genome as a comparative framework

    A draft human pangenome reference

    Get PDF
    Here the Human Pangenome Reference Consortium presents a first draft of the human pangenome reference. The pangenome contains 47 phased, diploid assemblies from a cohort of genetically diverse individuals1. These assemblies cover more than 99% of the expected sequence in each genome and are more than 99% accurate at the structural and base pair levels. Based on alignments of the assemblies, we generate a draft pangenome that captures known variants and haplotypes and reveals new alleles at structurally complex loci. We also add 119 million base pairs of euchromatic polymorphic sequences and 1,115 gene duplications relative to the existing reference GRCh38. Roughly 90 million of the additional base pairs are derived from structural variation. Using our draft pangenome to analyse short-read data reduced small variant discovery errors by 34% and increased the number of structural variants detected per haplotype by 104% compared with GRCh38-based workflows, which enabled the typing of the vast majority of structural variant alleles per sample

    Gaps and complex structurally variant loci in phased genome assemblies

    No full text
    There has been tremendous progress in phased genome assembly production by combining long-read data with parental information or linked-read data. Nevertheless, a typical phased genome assembly generated by trio-hifiasm still generates more than 140 gaps. We perform a detailed analysis of gaps, assembly breaks, and misorientations from 182 haploid assemblies obtained from a diversity panel of 77 unique human samples. Although trio-based approaches using HiFi are the current gold standard, chromosome-wide phasing accuracy is comparable when using Strand-seq instead of parental data. Importantly, the majority of assembly gaps cluster near the largest and most identical repeats (including segmental duplications [35.4%], satellite DNA [22.3%], or regions enriched in GA/AT-rich DNA [27.4%]). Consequently, 1513 protein-coding genes overlap assembly gaps in at least one haplotype, and 231 are recurrently disrupted or missing from five or more haplotypes. Furthermore, we estimate that 6–7 Mbp of DNA are misorientated per haplotype irrespective of whether trio-free or trio-based approaches are used. Of these misorientations, 81% correspond to bona fide large inversion polymorphisms in the human species, most of which are flanked by large segmental duplications. We also identify large-scale alignment discontinuities consistent with 11.9 Mbp of deletions and 161.4 Mbp of insertions per haploid genome. Although 99% of this variation corresponds to satellite DNA, we identify 230 regions of euchromatic DNA with frequent expansions and contractions, nearly half of which overlap with 197 protein-coding genes. Such variable and incompletely assembled regions are important targets for future algorithmic development and pangenome representation.</p
    corecore