Search CORE

14 research outputs found

Profiling variable-number tandem repeat variation across populations using repeat-pangenome graphs.

Author: Chaisson Mark J P
Lee Charles
Lu Tsung-Yu
Variation Consortium Human Genome Structural
Zhu Qihui
Publication venue: The Mouseion at the JAXlibrary
Publication date: 12/07/2021
Field of study

Variable number tandem repeats (VNTRs) are composed of consecutive repetitive DNA with hypervariable repeat count and composition. They include protein coding sequences and associations with clinical disorders. It has been difficult to incorporate VNTR analysis in disease studies that use short-read sequencing because the traditional approach of mapping to the human reference is less effective for repetitive and divergent sequences. In this work, we solve VNTR mapping for short reads with a repeat-pangenome graph (RPGG), a data structure that encodes both the population diversity and repeat structure of VNTR loci from multiple haplotype-resolved assemblies. We develop software to build a RPGG, and use the RPGG to estimate VNTR composition with short reads. We use this to discover VNTRs with length stratified by continental population, and expression quantitative trait loci, indicating that RPGG analysis of VNTRs will be critical for future studies of diversity and disease

The Jackson Laboratory: The Mouseion at the JAXlibrary

Mako: a graph-based pattern growth approach to detect complex structural variants

Author: Devine Scott E
Eichler Evan E
Guo Li
Jia Yanyan
Kosters Walter
Lee Charles
Lin Jiadong
Ryan Mallory
The Human Genome Structural Variation Consortium
Wang Songbo
Xu Tun
Yang Xiaofei
Ye Kai
Zhang Chengsheng
Zhu Qihui
Publication venue: 'Elsevier BV'
Publication date: 01/01/2022
Field of study

Computer Systems, Imagery and Medi

The Jackson Laboratory: The Mouseion at the JAXlibrary

Leiden University Scholary Publications

MDC Repository

Fully phased human genome assembly without parental data using single-cell strand sequencing and long reads

Author: Audano Peter A
Chaisson Mark J P
Devine Scott E
Ebert Peter
Ebler Jana
Eichler Evan E
Ghareghani Maryam
Harvey William T
Haukness Marina
Korbel Jan O
Lansdorp Peter M
Lee Charles
Marijon Pierre
Marschall Tobias
Munson Katherine M
Paten Benedict
Porubsky David
Sanders Ashley D
Sorensen Melanie
Structural Variation Consortium Human Genome
Sulovari Arvis
Vollger Mitchell R
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

Human genomes are typically assembled as consensus sequences that lack information on parental haplotypes. Here we describe a reference-free workflow for diploid de novo genome assembly that combines the chromosome-wide phasing and scaffolding capabilities of single-cell strand sequencing with continuous long-read or high-fidelity sequencing data. Employing this strategy, we produced a completely phased de novo genome assembly for each haplotype of an individual of Puerto Rican descent (HG00733) in the absence of parental data. The assemblies are accurate (quality value > 40) and highly contiguous (contig N50 > 23 Mbp) with low switch error rates (0.17%), providing fully phased single-nucleotide variants, indels and structural variants. A comparison of Oxford Nanopore Technologies and Pacific Biosciences phased assemblies identified 154 regions that are preferential sites of contig breaks, irrespective of sequencing technology or phasing algorithms

Crossref

The Jackson Laboratory: The Mouseion at the JAXlibrary

MDC Repository

MPG.PuRe

FusionSeq: a modular framework for finding gene fusions by analyzing paired-end RNA-sequencing data

We have developed FusionSeq to identify fusion transcripts from paired-end RNA-sequencing. FusionSeq includes filters to remove spurious candidate fusions with artifacts, such as misalignment or random pairing of transcript fragments, and it ranks candidates according to several statistics. It also has a module to identify exact sequences at breakpoint junctions. FusionSeq detected known and novel fusions in a specially sequenced calibration data set, including eight cancers with and without known rearrangements

Crossref

Springer - Publisher Connector

PubMed Central

muCNV: Genotyping Structural Variants for Population-level Sequencing.

Author: Boerwinkle Eric
English Adam
Gibbs Richard
Jun Goo
Kang Hyun Min
Lee Charles
Metcalf Ginger
Sedlazeck Fritz
Variation Consortium Human Genome Structural
Zhu Qihui
Publication venue: 'Oxford University Press (OUP)'
Publication date: 24/03/2021
Field of study

MOTIVATION: There are high demands for joint genotyping of structural variations with short-read sequencing, but efficient and accurate genotyping in population scale is a challenging task. RESULTS: We developed muCNV that aggregates per-sample summary pileups for joint genotyping of \u3e 100,000 samples. Pilot results show very low Mendelian inconsistencies. Applications to large-scale projects in cloud show the computational efficiencies of muCNV genotyping pipeline. AVAILABILITY: muCNV is publicly available for download at: https://github.com/gjun/muCNV. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online

Crossref

The Jackson Laboratory: The Mouseion at the JAXlibrary

PubMed Central

Inversion polymorphism in a complete human genome assembly

Author: Allison N. Rozanski
Ashley D. Sanders
Benedict Paten
David Porubsky
Evan E. Eichler
Hufsah Ashraf
Human Genome Structural Variation Consortium (HGSVC)
Human Pangenome Reference Consortium (HPRC)
Jan O. Korbel
Jana Ebler
Patrick Hasenfeld
Tobias Marschall
William T. Harvey
Wolfram Höps
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/04/2023
Field of study

Abstract The telomere-to-telomere (T2T) complete human reference has significantly improved our ability to characterize genome structural variation. To understand its impact on inversion polymorphisms, we remapped data from 41 genomes against the T2T reference genome and compared it to the GRCh38 reference. We find a ~ 21% increase in sensitivity improving mapping of 63 inversions on the T2T reference. We identify 26 misorientations within GRCh38 and show that the T2T reference is three times more likely to represent the correct orientation of the major human allele. Analysis of 10 additional samples reveals novel rare inversions at chromosomes 15q25.2, 16p11.2, 16q22.1–23.1, and 22q11.21

Directory of Open Access Journals

Mako: A Graph-based Pattern Growth Approach to Detect Complex Structural Variants.

Author: Devine Scott E
Eichler Evan E
Guo Li
Jia Yanyan
Kosters Walter
Lee Charles
Lin Jiadong
Ryan Mallory
Variation Consortium Human Genome Structural
Wang Songbo
Xu Tun
Yang Xiaofei
Ye Kai
Zhang Chengsheng
Zhu Qihui
Publication venue: 'Elsevier BV'
Publication date: 02/07/2021
Field of study

Complex structural variants (CSVs) are genomic alterations that have more than two breakpoints and are considered as the simultaneous occurrence of simple structural variants. However, detecting the compounded mutational signals of CSVs is challenging through a commonly used model-match strategy. As a result, there has been limited progress for CSV discovery compared with simple structural variants. We systematically analyzed the multi-breakpoint connection feature of CSVs, and proposed Mako, utilizing a bottom-up guided model-free strategy, to detect CSVs from paired-end short-read sequencing. Specifically, we implemented a graph-based pattern growth approach, where the graph depicts potential breakpoint connections, and pattern growth enables CSV detection without pre-defined models. Comprehensive evaluations on both simulated and real datasets revealed that Mako outperformed other algorithms. Notably, validation rates of CSV on real data based on experimental and computational validations as well as manual inspections are around 70%, where the medians of experimental and computational breakpoint shift are 13bp and 26bp, respectively. Moreover, the Mako CSV subgraph effectively characterized the breakpoint connections of a CSV event and uncovered a total of 15 CSV types, including two novel types of adjacent segments swap and tandem dispersed duplication. Further analysis of these CSVs also revealed the impact of sequence homology in the formation of CSVs. Mako is publicly available at https://github.com/xjtu-omics/Mako

The Jackson Laboratory: The Mouseion at the JAXlibrary

PubMed Central

An integrative TAD catalog in lymphoblastoid cell lines discloses the functional impact of deletions and insertions in human genomes.

Author: Bonder Marc Jan
Chaisson Mark J P
Eichler Evan E
Gerstein Mark B
HGSVC Functional Analysis Working Group
Human Genome Structural Variation Consortium (HGSVC)
Jensen Matthew
Korbel Jan O
Lee Charles
Li Chong
Marschall Tobias
Shi Xinghua
Syed Sabriya
Talkowski Michael E
Zody Michael C
Publication venue: The Mouseion at the JAXlibrary
Publication date: 01/12/2024
Field of study

The human genome is packaged within a three-dimensional (3D) nucleus and organized into structural units known as com- partments, topologically associating domains (TADs), and loops. TAD boundaries, separating adjacent TADs, have been found to be well conserved across mammalian species and more evolutionarily constrained than TADs themselves. Recent studies show that structural variants (SVs) can modify 3D genomes through the disruption of TADs, which play an essential role in insulating genes from outside regulatory elements’ aberrant regulation. However, how SV affects the 3D genome structure and their association among different aspects of gene regulation and candidate cis-regulatory elements (cCREs) have rarely been studied systematically. Here, we assess the impact of SVs intersecting with TAD boundaries by developing an integrative Hi-C analysis pipeline, which enables the generation of an in-depth catalog of TADs and TAD boundaries in human lymphoblastoid cell lines (LCLs) to fill the gap of limited resources. Our catalog contains 18,865 TADs, including 4596 sub-TADs, with 185 SVs (TAD–SVs) that alter chromatin architecture. By leveraging the ENCODE registry of cCREs in humans, we determine that 34 of 185 TAD–SVs intersect with cCREs and observe significant enrichment of TAD–SVs within cCREs. This study provides a database of TADs and TAD–SVs in the human genome that will facilitate future investigations of the impact of SVs on chromatin structure and gene regulation in health and disease

The Jackson Laboratory: The Mouseion at the JAXlibrary

ARTS repository - University of Groningen

Copy number variants in psychiatric disorders

Author: Alkan
Bassett
Bassett
Bassett
Bouwkamp
Carson
CNV and Schizophrenia Working Groups of the Psychiatric Genomics Consortium
Costain
Degenhardt
Dunnen
Feuk
Flaherty
Forsingdal
Georgieva
Green
Grozeva
Gur
Human Genome Structural Variation Working Group
Iafrate
Karayiorgou
Lindsay
Lowther
Lupski
Lupski
Martin
McDonald-McGinn
McDonald-McGinn
Merikangas
Michaelson
Miller
Mills
Redon
Schizophrenia Working Group of the Psychiatric Genomics Consortium
Sebat
Sebat
Shaffer
Shaffer
Stankiewicz
Sugama
Sullivan
Sullivan
Thygesen
Tuzun
Wolfe
Zhou
Publication venue: 'Elsevier BV'
Publication date: 01/01/2020
Field of study

Crossref

Fully phased human genome assembly without parental data using single-cell strand sequencing and long reads.

Author: Audano Peter A
Chaisson Mark J P
Devine Scott E
Ebert Peter
Ebler Jana
Eichler Evan E
Ghareghani Maryam
Harvey William T
Haukness Marina
Korbel Jan O
Lansdorp Peter M
Lee Charles
Marijon Pierre
Marschall Tobias
Munson Katherine M
Paten Benedict
Porubsky David
Sanders Ashley D
Sorensen Melanie
Structural Variation Consortium Human Genome
Sulovari Arvis
Vollger Mitchell R
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/03/2021
Field of study

Human genomes are typically assembled as consensus sequences that lack information on parental haplotypes. Here we describe a reference-free workflow for diploid de novo genome assembly that combines the chromosome-wide phasing and scaffolding capabilities of single-cell strand sequencing1,2 with continuous long-read or high-fidelity3 sequencing data. Employing this strategy, we produced a completely phased de novo genome assembly for each haplotype of an individual of Puerto Rican descent (HG00733) in the absence of parental data. The assemblies are accurate (quality value \u3e 40) and highly contiguous (contig N50 \u3e 23 Mbp) with low switch error rates (0.17%), providing fully phased single-nucleotide variants, indels and structural variants. A comparison of Oxford Nanopore Technologies and Pacific Biosciences phased assemblies identified 154 regions that are preferential sites of contig breaks, irrespective of sequencing technology or phasing algorithms

The Jackson Laboratory: The Mouseion at the JAXlibrary