300 research outputs found

    Chromosomal-level assembly of the Asian Seabass genome using long sequence reads and multi-layered scaffolding

    Get PDF
    We report here the ~670 Mb genome assembly of the Asian seabass (Lates calcarifer), a tropical marine teleost. We used long-read sequencing augmented by transcriptomics, optical and genetic mapping along with shared synteny from closely related fish species to derive a chromosome-level assembly with a contig N50 size over 1 Mb and scaffold N50 size over 25 Mb that span ~90% of the genome. The population structure of L. calcarifer species complex was analyzed by re-sequencing 61 individuals representing various regions across the species' native range. SNP analyses identified high levels of genetic diversity and confirmed earlier indications of a population stratification comprising three clades with signs of admixture apparent in the South-East Asian population. The quality of the Asian seabass genome assembly far exceeds that of any other fish species, and will serve as a new standard for fish genomics

    Single haplotype assembly of the human genome from a hydatidiform mole

    Get PDF
    A complete reference assembly is essential for accurately interpreting individual genomes and associating variation with phenotypes. While the current human reference genome sequence is of very high quality, gaps and misassemblies remain due to biological and technical complexities. Large repetitive sequences and complex allelic diversity are the two main drivers of assembly error. Although increasing the length of sequence reads and library fragments can improve assembly, even the longest available reads do not resolve all regions. In order to overcome the issue of allelic diversity, we used genomic DNA from an essentially haploid hydatidiform mole, CHM1. We utilized several resources from this DNA including a set of end-sequenced and indexed BAC clones and 100× Illumina whole-genome shotgun (WGS) sequence coverage. We used the WGS sequence and the GRCh37 reference assembly to create an assembly of the CHM1 genome. We subsequently incorporated 382 finished BAC clone sequences to generate a draft assembly, CHM1_1.1 (NCBI AssemblyDB GCA_000306695.2). Analysis of gene, repetitive element, and segmental duplication content show this assembly to be of excellent quality and contiguity. However, comparison to assembly-independent resources, such as BAC clone end sequences and PacBio long reads, indicate misassembled regions. Most of these regions are enriched for structural variation and segmental duplication, and can be resolved in the future. This publicly available assembly will be integrated into the Genome Reference Consortium curation framework for further improvement, with the ultimate goal being a completely finished gap-free assembly

    Identification of methylated deoxyadenosines in vertebrates reveals diversity in DNA modifications.

    Get PDF
    Methylation of cytosine deoxynucleotides generates 5-methylcytosine (m(5)dC), a well-established epigenetic mark. However, in higher eukaryotes much less is known about modifications affecting other deoxynucleotides. Here, we report the detection of N(6)-methyldeoxyadenosine (m(6)dA) in vertebrate DNA, specifically in Xenopus laevis but also in other species including mouse and human. Our methylome analysis reveals that m(6)dA is widely distributed across the eukaryotic genome and is present in different cell types but is commonly depleted from gene exons. Thus, direct DNA modifications might be more widespread than previously thought.M.J.K. was supported by the Long-Term Human Frontiers Fellowship (LT000149/2010-L), the Medical Research Council grant (G1001690), and by the Isaac Newton Trust Fellowship (R G76588). The work was sponsored by the Biotechnology and Biological Sciences Research Council grant BB/M022994/1 (J.B.G. and M.J.K.). The Gurdon laboratory is funded by the grant 101050/Z/13/Z (J.B.G.) from the Wellcome Trust, and is supported by the Gurdon Institute core grants, namely by the Wellcome Trust Core Grant (092096/Z/10/Z) and by the Cancer Research UK Grant (C6946/A14492). C.R.B. and G.E.A. are funded by the Wellcome Trust Core Grant. We are grateful to D. Simpson and R. Jones-Green for preparing X. laevis eggs and oocytes, F. Miller for providing us with M. musculus tissue, T. Dyl for X. laevis eggs and D. rerio samples, and to Gurdon laboratory members for their critical comments. We thank U. Ruether for providing us with M. musculus kidney DNA (Entwicklungs- und Molekularbiologie der Tiere, Heinrich Heine Universitaet Duesseldorf, Germany). We also thank J. Ahringer, S. Jackson, A. Bannister and T. Kouzarides for critical input and advice, M. Sciacovelli and E. Gaude for suggestions.This is the author accepted manuscript. The final version is available from Nature Publishing Group via http://dx.doi.org/10.1038/nsmb.314

    Cryptic species in a well-known habitat: applying taxonomics to the amphipod genus Epimeria (Crustacea, Peracarida)

    Get PDF
    Taxonomy plays a central role in biological sciences. It provides a communication system for scientists as it aims to enable correct identification of the studied organisms. As a consequence, species descriptions should seek to include as much available information as possible at species level to follow an integrative concept of ‘taxonomics’. Here, we describe the cryptic species Epimeria frankei sp. nov. from the North Sea, and also redescribe its sister species, Epimeria cornigera. The morphological information obtained is substantiated by DNA barcodes and complete nuclear 18S rRNA gene sequences. In addition, we provide, for the first time, full mitochondrial genome data as part of a metazoan species description for a holotype, as well as the neotype. This study represents the first successful implementation of the recently proposed concept of taxonomics, using data from highthroughput technologies for integrative taxonomic studies, allowing the highest level of confidence for both biodiversity and ecological research

    The Application of DNA Barcodes for the Identification of Marine Crustaceans from the North Sea and Adjacent Regions

    Get PDF
    During the last years DNA barcoding has become a popular method of choice for molecular specimen identification. Here we present a comprehensive DNA barcode library of various crustacean taxa found in the North Sea, one of the most extensively studied marine regions of the world. Our data set includes 1,332 barcodes covering 205 species, including taxa of the Amphipoda, Copepoda, Decapoda, Isopoda, Thecostraca, and others. This dataset represents the most extensive DNA barcode library of the Crustacea in terms of species number to date. By using the Barcode of Life Data Systems (BOLD), unique BINs were identified for 198 (96.6%) of the analyzed species. Six species were characterized by two BINs (2.9%), and three BINs were found for the amphipod species Gammarus salinus Spooner, 1947 (0.4%). Intraspecific distances with values higher than 2.2% were revealed for 13 species (6.3%). Exceptionally high distances of up to 14.87% between two distinct but monophyletic clusters were found for the parasitic copepod Caligus elongatus Nordmann, 1832, supporting the results of previous studies that indicated the existence of an overlooked sea louse species. In contrast to these high distances, haplotype-sharing was observed for two decapod spider crab species, Macropodia parva Van Noort & Adema, 1985 and Macropodia rostrata (Linnaeus, 1761), underlining the need for a taxonomic revision of both species. Summarizing the results, our study confirms the application of DNA barcodes as highly effective identification system for the analyzed marine crustaceans of the North Sea and represents an important milestone for modern biodiversity assessment studies using barcode sequence

    Comparative genomics of the major parasitic worms

    Get PDF
    Parasitic nematodes (roundworms) and platyhelminths (flatworms) cause debilitating chronic infections of humans and animals, decimate crop production and are a major impediment to socioeconomic development. Here we report a broad comparative study of 81 genomes of parasitic and non-parasitic worms. We have identified gene family births and hundreds of expanded gene families at key nodes in the phylogeny that are relevant to parasitism. Examples include gene families that modulate host immune responses, enable parasite migration though host tissues or allow the parasite to feed. We reveal extensive lineage-specific differences in core metabolism and protein families historically targeted for drug development. From an in silico screen, we have identified and prioritized new potential drug targets and compounds for testing. This comparative genomics resource provides a much-needed boost for the research community to understand and combat parasitic worms

    "Head-to-head" and "tail-to-tail" 180-degree domain walls in an isolated ferroelectric

    Full text link
    "Head-to-head" and "tail-to-tail" 180-degree domain-walls in a finite isolated ferroelectric sample are theoretically studied using Landau theory. The full set of equations, suitable for numerical calculations is developed. The explicit expressions for the polarization profile across the walls are derived for several limiting cases and wall-widths are estimated. It is shown analytically that different regimes of screening and different dependences for width of charged domain walls on the temperature and parameters of the system are possible, depending on spontaneous polarization and concentration of carriers in the material. It is shown that the half-width of charged domain walls in typical perovskites is about the nonlinear Thomas-Fermi screening-length and about one order of magnitude larger than the half-width of neutral domain-walls. The formation energies of "head-to-head" walls under different regimes of screening are obtained, neglecting the poling ability of the surface. It is shown that either "head-to-head" or "tail-to-tail" configuration can be energetically favorable in comparison with the monodomain state of the ferroelectric if the poling ability of the surface is large enough. If this is not the case, the existence of charged domain walls in bulk ferroelectrics is merely a result of the domain-growth kinetics. Size-effect corresponding to the competition between state with charged domain wall, single domain state, multidomain state, and the state with the zero polarization is considered. The results obtained for the case of an isolated ferroelectric sample were compared with the results for an electroded sample. It was shown that charged domain wall in electroded sample can be either metastable or stable, depends on the work function difference between electrodes and ferroelectric and the poling ability of the electrode/ferroelectric interface.Comment: 47 pages, 10 figure

    Parameters for accurate genome alignment

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Genome sequence alignments form the basis of much research. Genome alignment depends on various mundane but critical choices, such as how to mask repeats and which score parameters to use. Surprisingly, there has been no large-scale assessment of these choices using real genomic data. Moreover, rigorous procedures to control the rate of spurious alignment have not been employed.</p> <p>Results</p> <p>We have assessed 495 combinations of score parameters for alignment of animal, plant, and fungal genomes. As our gold-standard of accuracy, we used genome alignments implied by multiple alignments of proteins and of structural RNAs. We found the HOXD scoring schemes underlying alignments in the UCSC genome database to be far from optimal, and suggest better parameters. Higher values of the X-drop parameter are not always better. E-values accurately indicate the rate of spurious alignment, but only if tandem repeats are masked in a non-standard way. Finally, we show that γ-centroid (probabilistic) alignment can find highly reliable subsets of aligned bases.</p> <p>Conclusions</p> <p>These results enable more accurate genome alignment, with reliability measures for local alignments and for individual aligned bases. This study was made possible by our new software, LAST, which can align vertebrate genomes in a few hours <url>http://last.cbrc.jp/</url>.</p

    Fast Identification and Removal of Sequence Contamination from Genomic and Metagenomic Datasets

    Get PDF
    High-throughput sequencing technologies have strongly impacted microbiology, providing a rapid and cost-effective way of generating draft genomes and exploring microbial diversity. However, sequences obtained from impure nucleic acid preparations may contain DNA from sources other than the sample. Those sequence contaminations are a serious concern to the quality of the data used for downstream analysis, causing misassembly of sequence contigs and erroneous conclusions. Therefore, the removal of sequence contaminants is a necessary and required step for all sequencing projects. We developed DeconSeq, a robust framework for the rapid, automated identification and removal of sequence contamination in longer-read datasets (150 bp mean read length). DeconSeq is publicly available as standalone and web-based versions. The results can be exported for subsequent analysis, and the databases used for the web-based version are automatically updated on a regular basis. DeconSeq categorizes possible contamination sequences, eliminates redundant hits with higher similarity to non-contaminant genomes, and provides graphical visualizations of the alignment results and classifications. Using DeconSeq, we conducted an analysis of possible human DNA contamination in 202 previously published microbial and viral metagenomes and found possible contamination in 145 (72%) metagenomes with as high as 64% contaminating sequences. This new framework allows scientists to automatically detect and efficiently remove unwanted sequence contamination from their datasets while eliminating critical limitations of current methods. DeconSeq's web interface is simple and user-friendly. The standalone version allows offline analysis and integration into existing data processing pipelines. DeconSeq's results reveal whether the sequencing experiment has succeeded, whether the correct sample was sequenced, and whether the sample contains any sequence contamination from DNA preparation or host. In addition, the analysis of 202 metagenomes demonstrated significant contamination of the non-human associated metagenomes, suggesting that this method is appropriate for screening all metagenomes. DeconSeq is available at http://deconseq.sourceforge.net/

    Metagenomic identification of severe pneumonia pathogens in mechanically-ventilated patients:a feasibility and clinical validity study

    Get PDF
    BACKGROUND: Metagenomic sequencing of respiratory microbial communities for pathogen identification in pneumonia may help overcome the limitations of culture-based methods. We examined the feasibility and clinical validity of rapid-turnaround metagenomics with Nanopore™ sequencing of clinical respiratory specimens. METHODS: We conducted a case-control study of mechanically-ventilated patients with pneumonia (nine culture-positive and five culture-negative) and without pneumonia (eight controls). We collected endotracheal aspirates and applied a microbial DNA enrichment method prior to metagenomic sequencing with the Oxford Nanopore MinION device. For reference, we compared Nanopore results against clinical microbiologic cultures and bacterial 16S rRNA gene sequencing. RESULTS: Human DNA depletion enabled in depth sequencing of microbial communities. In culture-positive cases, Nanopore revealed communities with high abundance of the bacterial or fungal species isolated by cultures. In four cases with resistant clinical isolates, Nanopore detected antibiotic resistance genes corresponding to the phenotypic resistance in antibiograms. In culture-negative pneumonia, Nanopore revealed probable bacterial pathogens in 1/5 cases and Candida colonization in 3/5 cases. In controls, Nanopore showed high abundance of oral bacteria in 5/8 subjects, and identified colonizing respiratory pathogens in other subjects. Nanopore and 16S sequencing showed excellent concordance for the most abundant bacterial taxa. CONCLUSIONS: We demonstrated technical feasibility and proof-of-concept clinical validity of Nanopore metagenomics for severe pneumonia diagnosis, with striking concordance with positive microbiologic cultures, and clinically actionable information obtained from sequencing in culture-negative samples. Prospective studies with real-time metagenomics are warranted to examine the impact on antimicrobial decision-making and clinical outcomes
    corecore