319 research outputs found
ntLink: a toolkit for de novo genome assembly scaffolding and mapping using long reads
With the increasing affordability and accessibility of genome sequencing
data, de novo genome assembly is an important first step to a wide variety of
downstream studies and analyses. Therefore, bioinformatics tools that enable
the generation of high-quality genome assemblies in a computationally efficient
manner are essential. Recent developments in long-read sequencing technologies
have greatly benefited genome assembly work, including scaffolding, by
providing long-range evidence that can aid in resolving the challenging
repetitive regions of complex genomes. ntLink is a flexible and
resource-efficient genome scaffolding tool that utilizes long-read sequencing
data to improve upon draft genome assemblies built from any sequencing
technologies, including the same long reads. Instead of using read alignments
to identify candidate joins, ntLink utilizes minimizer-based mappings to infer
how input sequences should be ordered and oriented into scaffolds. Recent
improvements to ntLink have added important features such as overlap detection,
gap-filling and in-code scaffolding iterations. Here, we present three basic
protocols demonstrating how to use each of these new features to yield highly
contiguous genome assemblies, while still maintaining ntLink's proven
computational efficiency. Further, as we illustrate in the alternate protocols,
the lightweight minimizer-based mappings that enable ntLink scaffolding can
also be utilized for other downstream applications, such as misassembly
detection. With its modularity and multiple modes of execution, ntLink has
broad benefit to the genomics community, from genome scaffolding and beyond.
ntLink is an open-source project and is freely available from
https://github.com/bcgsc/ntLink.Comment: 23 pages, 2 figure
Swarm v3: towards tera-scale amplicon clustering
Motivation: Previously we presented swarm, an open-source amplicon clustering programme that produces fine-scale molecular operational taxonomic units (OTUs) that are free of arbitrary global clustering thresholds. Here, we present swarm v3 to address issues of contemporary datasets that are growing towards tera-byte sizes.
Results: When compared with previous swarm versions, swarm v3 has modernized C++ source code, reduced memory footprint by up to 50%, optimized CPU-usage and multithreading (more than 7 times faster with default parameters), and it has been extensively tested for its robustness and logic
Identifying cancer mutation targets across thousands of samples: MuteProc, a high throughput mutation analysis pipeline
BACKGROUND: In the past decade, bioinformatics tools have matured enough to reliably perform sophisticated primary data analysis on Next Generation Sequencing (NGS) data, such as mapping, assemblies and variant calling, however, there is still a dire need for improvements in the higher level analysis such as NGS data organization, analysis of mutation patterns and Genome Wide Association Studies (GWAS). RESULTS: We present a high throughput pipeline for identifying cancer mutation targets, capable of processing billions of variations across thousands of samples. This pipeline is coupled with our Human Variation Database to provide more complex down stream analysis on the variations hosted in the database. Most notably, these analysis include finding significantly mutated regions across multiple genomes and regions with mutational preferences within certain types of cancers. The results of the analysis is presented in HTML summary reports that incorporate gene annotations from various resources for the reported regions. CONCLUSION: MuteProc is available for download through the Vancouver Short Read Analysis Package on Sourceforge: http://vancouvershortr.sourceforge.net. Instructions for use and a tutorial are provided on the accompanying wiki pages at https://sourceforge.net/apps/mediawiki/vancouvershortr/index.php?title=Pipeline_introduction
Long-insert sequence capture detects high copy numbers in a defence-related beta-glucosidase gene βglu-1 with large variations in white spruce but not Norway spruce
Conifers are long-lived and slow-evolving, thus requiring effective defences against their fast-evolving insect natural enemies. The copy number variation (CNV) of two key acetophenone biosynthesis genes Ugt5/Ugt5b and βglu-1 may provide a plausible mechanism underlying the constitutively variable defence in white spruce (Picea glauca) against its primary defoliator, spruce budworm. This study develops a long-insert sequence capture probe set (Picea_hung_p1.0) for quantifying copy number of βglu-1-like, Ugt5-like genes and single-copy genes on 38 Norway spruce (Picea abies) and 40 P. glauca individuals from eight and nine provenances across Europe and North America respectively. We developed local assemblies (Piabi_c1.0 and Pigla_c.1.0), full-length transcriptomes (PIAB_v1 and PIGL_v1), and gene models to characterise the diversity of βglu-1 and Ugt5 genes. We observed very large copy numbers of βglu-1, with up to 381 copies in a single P. glauca individual. We observed among-provenance CNV of βglu-1 in P. glauca but not P. abies. Ugt5b was predominantly single-copy in both species. This study generates critical hypotheses for testing the emergence and mechanism of extreme CNV, the dosage effect on phenotype, and the varying copy number of genes with the same pathway. We demonstrate new approaches to overcome experimental challenges in genomic research in conifer defences
Recommended from our members
The genetic landscape of high-risk neuroblastoma
Neuroblastoma is a malignancy of the developing sympathetic nervous system that often presents with widespread metastatic disease, resulting in survival rates of less than 50%1. To determine the spectrum of somatic mutation in high-risk neuroblastoma, we studied 240 cases using a combination of whole exome, genome and transcriptome sequencing as part of the Therapeutically Applicable Research to Generate Effective Treatments (TARGET) initiative. Here we report a low median exonic mutation frequency of 0.60 per megabase (0.48 non-silent), and remarkably few recurrently mutated genes in these tumors. Genes with significant somatic mutation frequencies included ALK (9.2% of cases), PTPN11 (2.9%), ATRX (2.5%, an additional 7.1% had focal deletions), MYCN (1.7%, a recurrent p.Pro44Leu alteration), and NRAS (0.83%). Rare, potentially pathogenic germline variants were significantly enriched in ALK, CHEK2, PINK1, and BARD1. The relative paucity of recurrent somatic mutations in neuroblastoma challenges current therapeutic strategies reliant upon frequently altered oncogenic drivers
Conifers Concentrate Large Numbers of NLR Immune Receptor Genes on One Chromosome
Nucleotide-binding domain and leucine-rich repeat (NLR) immune receptor genes form a major line of defense in plants, acting in both pathogen recognition and resistance machinery activation. NLRs are reported to form large gene clusters in limber pine (Pinus flexilis), but it is unknown how widespread this genomic architecture may be among the extant species of conifers (Pinophyta). We used comparative genomic analyses to assess patterns in the abundance, diversity, and genomic distribution of NLR genes. Chromosome-level whole genome assemblies and high-density linkage maps in the Pinaceae, Cupressaceae, Taxaceae, and other gymnosperms were scanned for NLR genes using existing and customized pipelines. The discovered genes were mapped across chromosomes and linkage groups and analyzed phylogenetically for evolutionary history. Conifer genomes are characterized by dense clusters of NLR genes, highly localized on one chromosome. These clusters are rich in TNL-encoding genes, which seem to have formed through multiple tandem duplication events. In contrast to angiosperms and nonconiferous gymnosperms, genomic clustering of NLR genes is ubiquitous in conifers. NLR-dense genomic regions are likely to influence a large part of the plant's resistance, informing our understanding of adaptation to biotic stress and the development of genetic resources through breeding
Draft Genome Of The Mountain Pine Beetle, Dendroctonus Ponderosae Hopkins, A Major Forest Pest
- …
