375 research outputs found

    An investigation of causes of false positive single nucleotide polymorphisms using simulated reads from a small eukaryote genome

    Get PDF
    Background: Single Nucleotide Polymorphisms (SNPs) are widely used molecular markers, and their use has increased massively since the inception of Next Generation Sequencing (NGS) technologies, which allow detection of large numbers of SNPs at low cost. However, both NGS data and their analysis are error-prone, which can lead to the generation of false positive (FP) SNPs. We explored the relationship between FP SNPs and seven factors involved in mapping-based variant calling - quality of the reference sequence, read length, choice of mapper and variant caller, mapping stringency and filtering of SNPs by read mapping quality and read depth. This resulted in 576 possible factor level combinations. We used error- and variant-free simulated reads to ensure that every SNP found was indeed a false positive. Results: The variation in the number of FP SNPs generated ranged from 0 to 36,621 for the 120 million base pairs (Mbp) genome. All of the experimental factors tested had statistically significant effects on the number of FP SNPs generated and there was a considerable amount of interaction between the different factors. Using a fragmented reference sequence led to a dramatic increase in the number of FP SNPs generated, as did relaxed read mapping and a lack of SNP filtering. The choice of reference assembler, mapper and variant caller also significantly affected the outcome. The effect of read length was more complex and suggests a possible interaction between mapping specificity and the potential for contributing more false positives as read length increases. Conclusions: The choice of tools and parameters involved in variant calling can have a dramatic effect on the number of FP SNPs produced, with particularly poor combinations of software and/or parameter settings yielding tens of thousands in this experiment. Between-factor interactions make simple recommendations difficult for a SNP discovery pipeline but the quality of the reference sequence is clearly of paramount importance. Our findings are also a stark reminder that it can be unwise to use the relaxed mismatch settings provided as defaults by some read mappers when reads are being mapped to a relatively unfinished reference sequence from e.g. a non-model organism in its early stages of genomic exploration

    Transcriptomes and expression profiling of deep-sea corals from the Red Sea provide insight into the biology of azooxanthellate corals

    Get PDF
    Despite the importance of deep-sea corals, our current understanding of their ecology and evolutionis limited due to difficulties in sampling and studying deep-sea environments. Moreover, a recent reevaluation of habitat limitations has been suggested after characterization of deep-sea corals in the Red Sea, where they live at temperatures of above 20 °C at low oxygen concentrations. To gain further insight into the biology of deep-sea corals, we produced reference transcriptomes and studied gene expression of three deep-sea coral species from the Red Sea, i.e. Dendrophyllia sp., Eguchipsammia fistula, and Rhizotrochus typus. Our analyses suggest that deep-sea coral employ mitochondrial hypometabolism and anaerobic glycolysis to manage low oxygen conditions present in the Red Sea. Notably, we found expression of genes related to surface cilia motion that presumably enhance small particle transport rates in the oligotrophic deep-sea environment. This is the first study to characterize transcriptomes and in situ gene expression for deep-sea corals. Our work offers several mechanisms by which deep-sea corals might cope with the distinct environmental conditions present in the Red Sea. As such, our data provides direction for future research and further insight to organismal response of deep sea coral to environmental change and ocean warming.Tis work was supported by King Abdullah University of Science and Technology (KAUST), baseline funds to CRV and Center Competitive Funding (CCF) Program FCC/1/1973-18-01

    Using population admixture to help complete maps of the human genome

    Get PDF
    Tens of millions of base pairs of euchromatic human genome sequence, including many protein-coding genes, have no known location in the human genome. We describe an approach for localizing the human genome's missing pieces by utilizing the patterns of genome sequence variation created by population admixture. We mapped the locations of 70 scaffolds spanning four million base pairs of the human genome's unplaced euchromatic sequence, including more than a dozen protein-coding genes, and identified eight large novel inter-chromosomal segmental duplications. We find that most of these sequences are hidden in the genome's heterochromatin, particularly its pericentromeric regions. Many cryptic, pericentromeric genes are expressed in RNA and have been maintained intact for millions of years while their expression patterns diverged from those of paralogous genes elsewhere in the genome. We describe how knowledge of the locations of these sequences can inform disease association and genome biology studies

    Sequencing and de novo assembly of 150 genomes from Denmark as a population reference

    Get PDF
    Hundreds of thousands of human genomes are now being sequenced to characterize genetic variation and use this information to augment association mapping studies of complex disorders and other phenotypic traits. Genetic variation is identified mainly by mapping short reads to the reference genome or by performing local assembly. However, these approaches are biased against discovery of structural variants and variation in the more complex parts of the genome. Hence, large-scale de novo assembly is needed. Here we show that it is possible to construct excellent de novo assemblies from high-coverage sequencing with mate-pair libraries extending up to 20 kilobases. We report de novo assemblies of 150 individuals (50 trios) from the GenomeDenmark project. The quality of these assemblies is similar to those obtained using the more expensive long-read technology. We use the assemblies to identify a rich set of structural variants including many novel insertions and demonstrate how this variant catalogue enables further deciphering of known association mapping signals. We leverage the assemblies to provide 100 completely resolved major histocompatibility complex haplotypes and to resolve major parts of the Y chromosome. Our study provides a regional reference genome that we expect will improve the power of future association mapping studies and hence pave the way for precision medicine initiatives, which now are being launched in many countries including Denmark

    A High-Resolution Map of Human Evolutionary Constraint Using 29 Mammals

    Get PDF
    The comparison of related genomes has emerged as a powerful lens for genome interpretation. Here we report the sequencing and comparative analysis of 29 eutherian genomes. We confirm that at least 5.5% of the human genome has undergone purifying selection, and locate constrained elements covering ~4.2% of the genome. We use evolutionary signatures and comparisons with experimental data sets to suggest candidate functions for ~60% of constrained bases. These elements reveal a small number of new coding exons, candidate stop codon readthrough events and over 10,000 regions of overlapping synonymous constraint within protein-coding exons. We find 220 candidate RNA structural families, and nearly a million elements overlapping potential promoter, enhancer and insulator regions. We report specific amino acid residues that have undergone positive selection, 280,000 non-coding elements exapted from mobile elements and more than 1,000 primate- and human-accelerated elements. Overlap with disease-associated variants indicates that our findings will be relevant for studies of human biology, health and disease.National Human Genome Research Institute (U.S.)National Institute of General Medical Sciences (U.S.) (Grant number GM82901)National Science Foundation (U.S.). Postdoctural Fellowship (Award 0905968)National Science Foundation (U.S.). Career (0644282)National Institutes of Health (U.S.) (R01-HG004037)Alfred P. Sloan Foundation.Austrian Science Fund. Erwin Schrodinger Fellowshi

    An update on methods for Sarcopenia Diagnosis: From bench to bedside

    Get PDF
    Sarcopenia has been recognized as an age-related syndrome characterized by low muscle mass, low muscle strength, and low physical performance that is associated with increased likelihood of adverse outcomes including falls, fractures, hospitalization, frailty and mortality. Therefore, it is necessary to identify the condition early for applying intervention and prevention of the disastrous consequences of sarcopenia if left untreated. Clinical definition and diagnostic criteria for sarcopenia have been developed in the last years and different tools have been proposed for screening subjects with sarcopenia, evaluating the muscle mass, the muscle strength and the physical performance. In this review we analyzed the diagnostic criteria of sarcopenia and examined the current assessment tools used for the diagnosis and screening of sarcopenia

    Assessment of HCC response to Yttrium-90 radioembolization with gadoxetate disodium MRI: correlation with histopathology.

    Get PDF
    Transarterial <sup>90</sup> Y radioembolization (TARE) is increasingly being used for hepatocellular carcinoma (HCC) treatment. However, tumor response assessment after TARE may be challenging. We aimed to assess the diagnostic performance of gadoxetate disodium MRI for predicting complete pathologic necrosis (CPN) of HCC treated with TARE, using histopathology as the reference standard. This retrospective study included 48 patients (M/F: 36/12, mean age: 62 years) with HCC treated by TARE followed by surgery with gadoxetate disodium MRI within 90 days of surgery. Two radiologists evaluated tumor response using RECIST1.1, mRECIST, EASL, and LI-RADS-TR criteria and evaluated the percentage of necrosis on subtraction during late arterial, portal venous, and hepatobiliary phases (AP/PVP/HBP). Statistical analysis included inter-reader agreement, correlation between radiologic and pathologic percentage of necrosis, and prediction of CPN using logistic regression and ROC analyses. Histopathology demonstrated 71 HCCs (2.8 ± 1.7 cm, range: 0.5-7.5 cm) including 42 with CPN, 22 with partial necrosis, and 7 without necrosis. EASL and percentage of tumor necrosis on subtraction at the AP/PVP were independent predictors of CPN (p = 0.02-0.03). Percentage of necrosis, mRECIST, EASL, and LI-RADS-TR had fair to good performance for diagnosing CPN (AUCs: 0.78 - 0.83), with a significant difference between subtraction and LI-RADS-TR for reader 2, and in specificity between subtraction and other criteria for both readers (p-range: 0.01-0.04). Radiologic percentage of necrosis was significantly correlated to histopathologic degree of tumor necrosis (r = 0.66 - 0.8, p < 0.001). Percentage of tumor necrosis on subtraction and EASL criteria were significant independent predictors of CPN in HCC treated with TARE. Image subtraction should be considered for assessing HCC response to TARE when using MRI. • Percentage of tumor necrosis on image subtraction and EASL criteria are significant independent predictors of complete pathologic necrosis in hepatocellular carcinoma treated with <sup>90</sup> Y radioembolization. • Subtraction, mRECIST, EASL, and LI-RADS-TR have fair to good performance for diagnosing complete pathologic necrosis in hepatocellular carcinoma treated with <sup>90</sup> Y radioembolization

    Evaluation of Methods for De Novo Genome Assembly from High-Throughput Sequencing Reads Reveals Dependencies That Affect the Quality of the Results

    Get PDF
    Recent developments in high-throughput sequencing technology have made low-cost sequencing an attractive approach for many genome analysis tasks. Increasing read lengths, improving quality and the production of increasingly larger numbers of usable sequences per instrument-run continue to make whole-genome assembly an appealing target application. In this paper we evaluate the feasibility of de novo genome assembly from short reads (≤100 nucleotides) through a detailed study involving genomic sequences of various lengths and origin, in conjunction with several of the currently popular assembly programs. Our extensive analysis demonstrates that, in addition to sequencing coverage, attributes such as the architecture of the target genome, the identity of the used assembly program, the average read length and the observed sequencing error rates are powerful variables that affect the best achievable assembly of the target sequence in terms of size and correctness
    corecore