Search CORE

223 research outputs found

Discovery and genotyping of structural variation from long-read haploid genome sequence data

Author: Boitano Matthew
Chaisson Mark J.P.
Chin Chen-Shin
Eichler Evan E
Gordon David
Graves-Lindsay Tina A
Hoekzema Kendra
Huddleston John
Korlach Jonas
Kronenberg Zev N
Munson Katherine M
Peluso Paul
Steinberg Karyn Meltz
Vives Laura
Warren Wes
Wilson Richard K
Publication venue: Digital Commons@Becker
Publication date: 01/01/2016
Field of study

In an effort to more fully understand the full spectrum of human genetic variation, we generated deep single-molecule, real-time (SMRT) sequencing data from two haploid human genomes. By using an assembly-based approach (SMRT-SV), we systematically assessed each genome independently for structural variants (SVs) and indels resolving the sequence structure of 461,553 genetic variants from 2 bp to 28 kbp in length. We find that >89% of these variants have been missed as part of analysis of the 1000 Genomes Project even after adjusting for more common variants (MAF > 1%). We estimate that this theoretical human diploid differs by as much as ∼16 Mbp with respect to the human reference, with long-read sequencing data providing a fivefold increase in sensitivity for genetic variants ranging in size from 7 bp to 1 kbp compared with short-read sequence data. Although a large fraction of genetic variants were not detected by short-read approaches, once the alternate allele is sequence-resolved, we show that 61% of SVs can be genotyped in short-read sequence data sets with high accuracy. Uncoupling discovery from genotyping thus allows for the majority of this missed common variation to be genotyped in the human population. Interestingly, when we repeat SV detection on a pseudodiploid genome constructed in silico by merging the two haploids, we find that ∼59% of the heterozygous SVs are no longer detected by SMRT-SV. These results indicate that haploid resolution of long-read sequencing data will significantly increase sensitivity of SV detection.</jats:p

Crossref

Digital Commons@Becker

Reconstructing complex regions of genomes using long-read sequencing technology

Author: Can Alkan
Evan E. Eichler
Francesca Antonacci
John Huddleston
Jonas Korlach
Lawrence Hon
Maika Malig
Mark Chaisson
Megan Y. Dennis
Peter H. Sudmant
Richard K. Wilson
Stephen W. Turner
Swati Ranade
Tina A. Graves
Publication venue: 'Cold Spring Harbor Laboratory'
Publication date: 09/01/2014
Field of study

Cataloged from PDF version of article.Obtaining high-quality sequence continuity of complex regions of recent segmental duplication remains one of the major challenges of finishing genome assemblies. In the human and mouse genomes, this was achieved by targeting large-insert clones using costly and laborious capillary-based sequencing approaches. Sanger shotgun sequencing of clone inserts, however, has now been largely abandoned, leaving most of these regions unresolved in newer genome assemblies generated primarily by next-generation sequencing hybrid approaches. Here we show that it is possible to resolve regions that are complex in a genome-wide context but simple in isolation for a fraction of the time and cost of traditional methods using long-read single molecule, real-time (SMRT) sequencing and assembly technology from Pacific Biosciences (PacBio). We sequenced and assembled BAC clones corresponding to a 1.3-Mbp complex region of chromosome 17q21.31, demonstrating 99.994% identity to Sanger assemblies of the same clones. We targeted 44 differences using Illumina sequencing and find that PacBio and Sanger assemblies share a comparable number of validated variants, albeit with different sequence context biases. Finally, we targeted a poorly assembled 766-kbp duplicated region of the chimpanzee genome and resolved the structure and organization for a fraction of the cost and time of traditional finishing approaches. Our data suggest a straightforward path for upgrading genomes to a higher quality finished state

Crossref

Bilkent University Institutional Repository

PubMed Central

eScholarship - University of California

Assembly of long error-prone reads using de Bruijn graphs

Author: Chaisson Mark
Kolmogorov Mikhail
Lin Yu
Pevzner Pavel A.
Shen Max W.
Yuan Jeffrey
Publication venue: 'Proceedings of the National Academy of Sciences'
Publication date: 29/11/2018
Field of study

The recent breakthroughs in assembling long error-prone reads were based on the overlap-layout-consensus (OLC) approach and did not utilize the strengths of the alternative de Bruijn graph approach to genome assembly. Moreover, these studies often assume that applications of the de Bruijn graph approach are limited to short and accurate reads and that the OLC approach is the only practical paradigm for assembling long error-prone reads. We show how to generalize de Bruijn graphs for assembling long error-prone reads and describe the ABruijn assembler, which combines the de Bruijn graph and the OLC approaches and results in accurate genome reconstructions

The Australian National University

Profiling variable-number tandem repeat variation across populations using repeat-pangenome graphs.

Author: Chaisson Mark J P
Lee Charles
Lu Tsung-Yu
Variation Consortium Human Genome Structural
Zhu Qihui
Publication venue: The Mouseion at the JAXlibrary
Publication date: 12/07/2021
Field of study

Variable number tandem repeats (VNTRs) are composed of consecutive repetitive DNA with hypervariable repeat count and composition. They include protein coding sequences and associations with clinical disorders. It has been difficult to incorporate VNTR analysis in disease studies that use short-read sequencing because the traditional approach of mapping to the human reference is less effective for repetitive and divergent sequences. In this work, we solve VNTR mapping for short reads with a repeat-pangenome graph (RPGG), a data structure that encodes both the population diversity and repeat structure of VNTR loci from multiple haplotype-resolved assemblies. We develop software to build a RPGG, and use the RPGG to estimate VNTR composition with short reads. We use this to discover VNTRs with length stratified by continental population, and expression quantitative trait loci, indicating that RPGG analysis of VNTRs will be critical for future studies of diversity and disease

The Jackson Laboratory: The Mouseion at the JAXlibrary

Mechanism of organization increase in complex systems

Author: Alain
Annila
Annila
Bar-Yam
Bejan
Bejan
Bell
Bertalanffy
Blagus
Bonner
Boyer
Carneiro
Chaisson
Chaisson
Chaisson
Chaisson
Chaisson
Chatterjee
Chatterjee
Dangalchev
Gauss
Georgiev
Georgiev
Georgiev
Gershenson
Gladyshev
Goh
Goldstein
Haken
Haken
Hartonen
Hertz
Hübler
Hübler
James
Kassebaum
Kitsak
Kleiber
Kurzweil
Liu
Mark
Moore
Mäkelä
Nagy
Nicolis
Onsager
Onsager
Paltridge
Pernu
Rozenfeld
Salthe
Smart
Smyth
Tang
Ulanowlcz
Vandenberg
Vidal
West
Wu
Xulvi-Brunet
Zhang
Ángeles
Publication venue: 'Wiley'
Publication date: 15/06/2014
Field of study

This paper proposes a variational approach to describe the evolution of organization of complex systems from first principles, as increased efficiency of physical action. Most simply stated, physical action is the product of the energy and time necessary for motion. When complex systems are modeled as flow networks, this efficiency is defined as a decrease of action for one element to cross between two nodes, or endpoints of motion - a principle of least unit action. We find a connection with another principle that of most total action, or a tendency for increase of the total action of a system. This increase provides more energy and time for minimization of the constraints to motion in order to decrease unit action, and therefore to increase organization. Also, with the decrease of unit action in a system, its capacity for total amount of action increases. We present a model of positive feedback between action efficiency and the total amount of action in a complex system, based on a system of ordinary differential equations, which leads to an exponential growth with time of each and a power law relation between the two. We present an agreement of our model with data for core processing units of computers. This approach can help to describe, measure, manage, design and predict future behavior of complex systems to achieve the highest rates of self-organization and robustness.Comment: 22 pages, 4 figures, 1 tabl

arXiv.org e-Print Archive

Crossref

Digital Commons @ Assumption College

Scalable Nanopore Sequencing of Human Genomes Provides a Comprehensive View of Haplotype-Resolved Variation and Methylation

Author: Alvarez Jerez Pilar
Asri Mobin
Behera Sairam
Billingsley Kimberley J
Blauwendraat Cornelis
Carnevali Paolo
Chaisson Mark
Daida Kensuke
Dewan Ramita
Genner Rylee M
Jain Miten
Kolmogorov Mikhail
Lorig-Roach Ryan
Malik Laksh
Mastoras Mira
Meredith Melissa
Miga Karen H
Monlong Jean
Paten Benedict
Pesout Trevor
Phillippy Adam M
Prabakaran Jeshuwin
Reed Xylena
Rhie Arang
Scholz Sonja W
Sedlazeck Fritz J
Shafin Kishwar
Timp Winston
Traynor Bryan J
Yang Jianzhi
Publication venue: DigitalCommons@TMC
Publication date: 01/10/2023
Field of study

Long-read sequencing technologies substantially overcome the limitations of short-reads but have not been considered as a feasible replacement for population-scale projects, being a combination of too expensive, not scalable enough or too error-prone. Here we develop an efficient and scalable wet lab and computational protocol, Napu, for Oxford Nanopore Technologies long-read sequencing that seeks to address those limitations. We applied our protocol to cell lines and brain tissue samples as part of a pilot project for the National Institutes of Health Center for Alzheimer\u27s and Related Dementias. Using a single PromethION flow cell, we can detect single nucleotide polymorphisms with F1-score comparable to Illumina short-read sequencing. Small indel calling remains difficult within homopolymers and tandem repeats, but achieves good concordance to Illumina indel calls elsewhere. Further, we can discover structural variants with F1-score on par with state-of-the-art de novo assembly methods. Our protocol phases small and structural variants at megabase scales and produces highly accurate, haplotype-specific methylation calls

DigitalCommons@The Texas Medical Center

Integrating Sequencing Technologies in Personal Genomics: Optimal Low Cost Reconstruction of Structural Variants

Author: AE Urban
D Pinkel
DA Wheeler
DR Bentley
DR Zerbino
F Sanger
GH Perry
J Butler
J Rozowsky
JC Dohm
JC Venter
Jiang Du
JO Korbel
JO Korbel
JY Hehir-Kwa
M Margulies
M Pop
M Pop
Mark B. Gerstein
Michael Snyder
MJ Chaisson
PA Pevzner
R Lippert
R Redon
R Schmid
RL Warren
Robert D. Bjornson
RR Selzer
S Batzoglou
S Levy
SMD Goldberg
V Bansal
William Stafford Noble
Yong Kong
Zhengdong D. Zhang
Publication venue: Public Library of Science
Publication date: 01/07/2009
Field of study

The goal of human genome re-sequencing is obtaining an accurate assembly of an individual's genome. Recently, there has been great excitement in the development of many technologies for this (e.g. medium and short read sequencing from companies such as 454 and SOLiD, and high-density oligo-arrays from Affymetrix and NimbelGen), with even more expected to appear. The costs and sensitivities of these technologies differ considerably from each other. As an important goal of personal genomics is to reduce the cost of re-sequencing to an affordable point, it is worthwhile to consider optimally integrating technologies. Here, we build a simulation toolbox that will help us optimally combine different technologies for genome re-sequencing, especially in reconstructing large structural variants (SVs). SV reconstruction is considered the most challenging step in human genome re-sequencing. (It is sometimes even harder than de novo assembly of small genomes because of the duplications and repetitive sequences in the human genome.) To this end, we formulate canonical problems that are representative of issues in reconstruction and are of small enough scale to be computationally tractable and simulatable. Using semi-realistic simulations, we show how we can combine different technologies to optimally solve the assembly at low cost. With mapability maps, our simulations efficiently handle the inhomogeneous repeat-containing structure of the human genome and the computational complexity of practical assembly algorithms. They quantitatively show how combining different read lengths is more cost-effective than using one length, how an optimal mixed sequencing strategy for reconstructing large novel SVs usually also gives accurate detection of SNPs/indels, how paired-end reads can improve reconstruction efficiency, and how adding in arrays is more efficient than just sequencing for disentangling some complex SVs. Our strategy should facilitate the sequencing of human genomes at maximum accuracy and low cost

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

An integrated map of structural variation in 2,504 human genomes

Author: Abyzov Alexej
Alkan Can
Antaki Danny
Auton Adam
Bae Taejeong
Casale Francesco Paolo
Cerveira Eliza
Chaisson Mark J.P.
Chen Jieming
Chen Ken
Chines Peter
Chong Zechen
Dayama Gargi
Fritz Markus His Yang
Gardner Eugene J.
Garrison Erik
Handsaker Robert E.
Hormozdiari Fereydoun
Huddleston John
Jun Goo
Kashin Seva
Konkel Miriam K.
Lam Hugo Y.K.
Malhotra Ankit
Malig Maika
Meiers Sascha
Mu Xinmeng Jasmine
Rausch Tobias
Shi Xinghua
Stütz Adrian M.
Sudmant Peter H.
Walter Klaudia
Ye Kai
Zhang Yan
Publication venue: LSU Digital Commons
Publication date: 30/09/2015
Field of study

© 2015 Macmillan Publishers Limited. All rights reserved. Structural variants are implicated in numerous diseases and make up the majority of varying nucleotides among human genomes. Here we describe an integrated set of eight structural variant classes comprising both balanced and unbalanced variants, which we constructed using short-read DNA sequencing data and statistically phased onto haplotype blocks in 26 human populations. Analysing this set, we identify numerous gene-intersecting structural variants exhibiting population stratification and describe naturally occurring homozygous gene knockouts that suggest the dispensability of a variety of human genes. We demonstrate that structural variants are enriched on haplotypes identified by genome-wide association studies and exhibit enrichment for expression quantitative trait loci. Additionally, we uncover appreciable levels of structural variant complexity at different scales, including genic loci subject to clusters of repeated rearrangement and complex structural variants with multiple breakpoints likely to have formed through individual mutational events. Our catalogue will enhance future studies into structural variant demography, functional impact and disease association

LSU Scholarly Repository (Louisiana State Univ.)

Characterization of the Conus bullatus genome and its venom-duct transcriptome

Abstract Background The venomous marine gastropods, cone snails (genus <it>Conus</it>), inject prey with a lethal cocktail of conopeptides, small cysteine-rich peptides, each with a high affinity for its molecular target, generally an ion channel, receptor or transporter. Over the last decade, conopeptides have proven indispensable reagents for the study of vertebrate neurotransmission. <it>Conus bullatus </it>belongs to a clade of <it>Conus </it>species called <it>Textilia</it>, whose pharmacology is still poorly characterized. Thus the genomics analyses presented here provide the first step toward a better understanding the enigmatic <it>Textilia </it>clade. Results We have carried out a sequencing survey of the <it>Conus bullatus </it>genome and venom-duct transcriptome. We find that conopeptides are highly expressed within the venom-duct, and describe an <it>in silico </it>pipeline for their discovery and characterization using RNA-seq data. We have also carried out low-coverage shotgun sequencing of the genome, and have used these data to determine its size, genome-wide base composition, simple repeat, and mobile element densities. Conclusions Our results provide the first global view of venom-duct transcription in any cone snail. A notable feature of <it>Conus bullatus </it>venoms is the breadth of A-superfamily peptides expressed in the venom duct, which are unprecedented in their structural diversity. We also find SNP rates within conopeptides are higher compared to the remainder of <it>C. bullatus </it>transcriptome, consistent with the hypothesis that conopeptides are under diversifying selection.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central