27 research outputs found

    Functional analysis of structural variants in single cells using Strand-seq

    Full text link
    Somatic structural variants (SVs) are widespread in cancer, but their impact on disease evolution is understudied due to a lack of methods to directly characterize their functional consequences. We present a computational method, scNOVA, which uses Strand-seq to perform haplotype-aware integration of SV discovery and molecular phenotyping in single cells by using nucleosome occupancy to infer gene expression as a readout. Application to leukemias and cell lines identifies local effects of copy-balanced rearrangements on gene deregulation, and consequences of SVs on aberrant signaling pathways in subclones. We discovered distinct SV subclones with dysregulated Wnt signaling in a chronic lymphocytic leukemia patient. We further uncovered the consequences of subclonal chromothripsis in T cell acute lymphoblastic leukemia, which revealed c-Myb activation, enrichment of a primitive cell state and informed successful targeting of the subclone in cell culture, using a Notch inhibitor. By directly linking SVs to their functional effects, scNOVA enables systematic single-cell multiomic studies of structural variation in heterogeneous cell populations

    Recurrent inversion polymorphisms in humans associate with genetic instability and genomic disorders.

    Get PDF
    Unlike copy number variants (CNVs), inversions remain an underexplored genetic variation class. By integrating multiple genomic technologies, we discover 729 inversions in 41 human genomes. Approximately 85% of inversionsretrotransposition; 80% of the larger inversions are balanced and affect twice as many nucleotides as CNVs. Balanced inversions show an excess of common variants, and 72% are flanked by segmental duplications (SDs) or retrotransposons. Since flanking repeats promote non-allelic homologous recombination, we developed complementary approaches to identify recurrent inversion formation. We describe 40 recurrent inversions encompassing 0.6% of the genome, showing inversion rates up to 2.7 × 1

    Complex genetic variation in nearly complete human genomes.

    Get PDF
    Diverse sets of complete human genomes are required to construct a pangenome reference and to understand the extent of complex structural variation. Here we sequence 65 diverse human genomes and build 130 haplotype-resolved assemblies (median continuity of 130 Mb), closing 92% of all previous assembly gaps1,2 and reaching telomere-to-telomere status for 39% of the chromosomes. We highlight complete sequence continuity of complex loci, including the major histocompatibility complex (MHC), SMN1/SMN2, NBPF8 and AMY1/AMY2, and fully resolve 1,852 complex structural variants. In addition, we completely assemble and validate 1,246 human centromeres. We find up to 30-fold variation in α-satellite higher-order repeat array length and characterize the pattern of mobile element insertions into α-satellite higher-order repeat arrays. Although most centromeres predict a single site of kinetochore attachment, epigenetic analysis suggests the presence of two hypomethylated regions for 7% of centromeres. Combining our data with the draft pangenome reference1 significantly enhances genotyping accuracy from short-read data, enabling whole-genome inference3 to a median quality value of 45. Using this approach, 26,115 structural variants per individual are detected, substantially increasing the number of structural variants now amenable to downstream disease association studies

    Gaps and complex structurally variant loci in phased genome assemblies

    Full text link
    ABSTRACTThere has been tremendous progress in the production of phased genome assemblies by combining long-read data with parental information or linking read data. Nevertheless, a typical phased genome assembly generated by trio-hifiasm still generates more than ~140 gaps. We perform a detailed analysis of gaps, assembly breaks, and misorientations from 77 phased and assembled human genomes (154 unique haplotypes). We find that trio-based approaches using HiFi are the current gold standard although chromosome-wide phasing accuracy is comparable when using Strand-seq instead of parental data. We find two-thirds of defined contig ends cluster near the largest and most identical repeats [including segmental duplications (35.4%) or satellite DNA (22.3%) or to regions enriched in GA/AT rich DNA (27.4%)]. As a result, 1513 protein-coding genes overlap assembly gaps in at least one haplotype and 231 are recurrently disrupted or missing from five or more haplotypes. In addition, we estimate that 6-7 Mbp of DNA are incorrectly orientated per haplotype irrespective of whether trio-free or trio-based approaches are employed. 81% of such misorientations correspond tobona fidelarge inversion polymorphisms in the human species, most of which are flanked by large identical segmental duplications. In addition, we also identify large-scale alignment discontinuities consistent with an 11.9 Mbp deletion and 161.4 Mbp of insertion per human haploid genome. While 99% of this variation corresponds to satellite DNA, we identify 230 regions of the euchromatic DNA with frequent expansions and contractions, nearly half of which overlap with 197 protein-coding genes. Although not completely resolved, these regions include copy number polymorphic and biomedically relevant genic regions where complete resolution and a pangenome representation will be most useful, yet most challenging, to realize.</jats:p

    Additional file 2 of Inversion polymorphism in a complete human genome assembly

    No full text
    Additional file 2: Table S1. Nonredundant inversion callset reported in this study. Table S2. Putative novel inversions with respect to T2T-CHM13 reference. Table S3. Enrichment of inversions in pericentromeric regions. Table S4. List of minor alleles and resolved orientation errors in GRCh38. Table S5. Novel inversions in HPRC Strand-seq dataset

    Inversion polymorphism in a complete human genome assembly

    No full text
    Abstract The telomere-to-telomere (T2T) complete human reference has significantly improved our ability to characterize genome structural variation. To understand its impact on inversion polymorphisms, we remapped data from 41 genomes against the T2T reference genome and compared it to the GRCh38 reference. We find a ~ 21% increase in sensitivity improving mapping of 63 inversions on the T2T reference. We identify 26 misorientations within GRCh38 and show that the T2T reference is three times more likely to represent the correct orientation of the major human allele. Analysis of 10 additional samples reveals novel rare inversions at chromosomes 15q25.2, 16p11.2, 16q22.1–23.1, and 22q11.21

    Additional file 1 of Inversion polymorphism in a complete human genome assembly

    No full text
    Additional file 1: Figure S1. T2T-CHM13 inversion callset summary and comparison to GRCh38 (n = 373). Figure S2. Differences between GRCh38 and T2T-CHM13 callsets. Figure S3. Inversion callset summary with respect to T2T-CHM13 reference. Figure S4. Nonsyntenic and likely novel sites in T2T-CHM13 inversion calls. Figure S5. Enrichment of inversions in pericentromeric regions. Figure S6. Sequence composition of inversions from pericentromeric regions. Figure S7. Novel pericentromeric inversion on chromosome 1. Figure S8. Complete assemblies of chromosome 1 centromeric region. Figure S9. Relative position of alpha satellite array and novel pericentromeric inversion on chromosome 1. Figure S10. Inversion phasing at pericentromeric region of chromosome 7. Figure S11. Evaluation of putative misorients in GRCh38 with respect to T2T-CHM13. Figure S12. Evaluation of inversion differences between GRCh38 and T2T-CHM13 references. Figure S13. Examples of minor and misoriented alleles at chromosome 16. Figure S14. Structural differences at Xq28 between GRCh38 and T2T-CHM13. Figure S15. Diverse structural haplotypes at the Xq28 region. Figure S16. Structural differences at 16p12.2 between GRCh38 and T2T-CHM13. Figure S17. Topological differences at 16p12.2 between GRCh38 and T2T-CHM13. Figure S18. Rare inversions at disease relevant loci. Figure S19. Diverse structural haplotypes at 15q25.2 region. Figure S20. Assembled inversion breakpoints at 15q25.2 and inversion breakpoint mapping. Figure S21. Example of long-lasting misorientation errors in previous human genome references. Supplementary Notes. Consortia
    corecore