6 research outputs found
Genome-wide variant analysis of simplex autism families with an integrative clinical-bioinformatics pipeline
Autism spectrum disorders (ASDs) are a group of developmental disabilities that affect social interaction and communication and are characterized by repetitive behaviors. There is now a large body of evidence that suggests a complex role of genetics in ASDs, in which many different loci are involved. Although many current population-scale genomic studies have been demonstrably fruitful, these studies generally focus on analyzing a limited part of the genome or use a limited set of bioinformatics tools. These limitations preclude the analysis of genome-wide perturbations that may contribute to the development and severity of ASD-related phenotypes. To overcome these limitations, we have developed and utilized an integrative clinical and bioinformatics pipeline for generating a more complete and reliable set of genomic variants for downstream analyses. Our study focuses on the analysis of three simplex autism families consisting of one affected child, unaffected parents, and one unaffected sibling. All members were clinically evaluated and widely phenotyped. Genotyping arrays and whole-genome sequencing were performed on each member, and the resulting sequencing data were analyzed using a variety of available bioinformatics tools. We searched for rare variants of putative functional impact that were found to be segregating according to de novo, autosomal recessive, X-linked, mitochondrial, and compound heterozygote transmission models. The resulting candidate variants included three small heterozygous copy-number variations (CNVs), a rare heterozygous de novo nonsense mutation in MYBBP1A located within exon 1, and a novel de novo missense variant in LAMB3. Our work demonstrates how more comprehensive analyses that include rich clinical data and whole-genome sequencing data can generate reliable results for use in downstream investigations
Whole genome analysis of an extended pedigree with Prader–Willi Syndrome, hereditary hemochromatosis, and dysautonomia-like symptoms
This report includes the discovery and analysis of a pedigree with Prader–Willi Syndrome (PWS), hereditary hemochromatosis (HH), and dysautonomia-like symptoms. Nine members of the family participated in whole genome sequencing (WGS), which enabled a wide scope of variant calling from single-nucleotide polymorphisms to copy number variations. First, a 5.5 Mb de novo deletion is identified in the chromosome region 15q11.2 to 15q13.1 in the boy with PWS. Second, a female invididual with HH is homozygous for the p.C282Y variant in HFE, a mutation known to be associated with HH. Her brother is homozygous for the same variant, although he has yet to be clinically diagnosed with HH. Third, none of the people with dysautonomia-like symptoms carry any reported or novel rare variants in IKBKAP that are implicated in familial dysautonomia (FD - HSAN III). Although two people with dysautonomia-like symptoms carry two heterozygous variants in NTRK1, a gene that has been shown to contribute to HSAN IV (congenital insensitivity to pain with anhidrosis, a disease that closely resembles FD), this variant is not present in the third proband. Fourth, WGS revealed pharmacogenetic variants influencing the metabolism of warfarin and simvastatin, which are being routinely prescribed to the proband. Finally, reports of the phenotypes were standardized with the Human Phenotype Ontology annotation, which may facilitate the search for other families with similar phenotypes. Due to the extreme heterogeneity and insufficient knowledge of human diseases, it is of crucial importance that both phenotypic data and genomic data are standardized and shared.</jats:p
Reducing INDEL calling errors in whole-genome and exome sequencing data
BackgroundINDELs, especially those disrupting protein-coding regions of the genome, have been strongly associated with human diseases. However, there are still many errors with INDEL variant calling, driven by library preparation, sequencing biases, and algorithm artifacts.MethodsWe characterized whole genome sequencing (WGS), whole exome sequencing (WES), and PCR-free sequencing data from the same samples to investigate the sources of INDEL errors. We also developed a classification scheme based on the coverage and composition to rank high and low quality INDEL calls. We performed a large-scale validation experiment on 600 loci, and find high-quality INDELs to have a substantially lower error rate than low quality INDELs (7% vs. 51%).ResultsSimulation and experimental data show that assembly based callers are significantly more sensitive and robust for detecting large INDELs (>5bp) than alignment based callers, consistent with published data. The concordance of INDEL detection between WGS and WES is low (52%), and WGS data uniquely identifies 10.8-fold more high-quality INDELs. The validation rate for WGS-specific INDELs is also much higher than that for WES-specific INDELs (85% vs. 54%), and WES misses many large INDELs. In addition, the concordance for INDEL detection between standard WGS and PCR-free sequencing is 71%, and standard WGS data uniquely identifies 6.3-fold more low-quality INDELs. Furthermore, accurate detection with Scalpel of heterozygous INDELs requires 1.2-fold higher coverage than that for homozygous INDELs. Lastly, homopolymer A/T INDELs are a major source of low-quality INDEL calls, and they are highly enriched in the WES data.ConclusionsOverall, we show that accuracy of INDEL detection with WGS is much greater than WES even in the targeted region. We calculated that 60X WGS depth of coverage from the HiSeq platform is needed to recover 95% of INDELs detected by Scalpel. While this is higher than current sequencing practice, the deeper coverage may save total project costs because of the greater accuracy and sensitivity. Finally, we investigate sources of INDEL errors (e.g. capture deficiency, PCR amplification, homopolymers) with various data that will serve as a guideline to effectively reduce INDEL errors in genome sequencing.</jats:p
