9,928 research outputs found
Parallel approach to sliding window sums
Sliding window sums are widely used in bioinformatics applications, including
sequence assembly, k-mer generation, hashing and compression. New vector
algorithms which utilize the advanced vector extension (AVX) instructions
available on modern processors, or the parallel compute units on GPUs and
FPGAs, would provide a significant performance boost for the bioinformatics
applications. We develop a generic vectorized sliding sum algorithm with
speedup for window size w and number of processors P is O(P/w) for a generic
sliding sum. For a sum with commutative operator the speedup is improved to
O(P/log(w)). When applied to the genomic application of minimizer based k-mer
table generation using AVX instructions, we obtain a speedup of over 5X.Comment: 10 pages, 5 figure
Accelerating exhaustive pairwise metagenomic comparisons
In this manuscript, we present an optimized and parallel version of our previous work IMSAME, an exhaustive gapped aligner for the pairwise and accurate comparison of metagenomes. Parallelization strategies are applied to take advantage of modern multiprocessor architectures. In addition, sequential optimizations in CPU time and memory consumption are provided. These algorithmic and computational enhancements enable IMSAME to calculate near optimal alignments which are used to directly assess similarity between metagenomes without requiring reference databases. We show that the overall efficiency of the parallel implementation is superior to 80% while retaining scalability as the number of parallel cores used increases. Moreover, we also show thats equential optimizations yield up to 8x speedup for scenarios with larger data.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tec
Isolation and characterization of the full-length cDNA encoding a member of a novel cytochrome p450 family (CYP320A1) from the tropical freshwater snail, Biomphalaria glabrata, intermediate host for Schistosoma mansoni
Cytochrome p450s (cyp450s) are a family of structurally related proteins, with diverse functions, including steroid synthesis and breakdown of toxins. This paper reports the full-length sequence of a novel cyp450 gene, the first to be isolated from the tropical freshwater snail Biomphalaria glabrata, an important intermediate host of Schistosoma mansoni. The nucleotide sequence is 2291 bp with a predicted amino acid sequence of 584aa. The sequence demonstrates conserved cyp450 structural motifs, but is sufficiently different from previously reported cyp450 sequences to be given a new classification, CYP320A1. Initially identified as down-regulated in partially resistant snails in response to S. mansoni infection, amplification of this gene using RT-PCR in both totally resistant or susceptible snail lines when exposed to infection, and all tissues examined, suggests ubiquitous expression. Characterization of the first cyp450 from B. glabrata is significant in understanding the evolution of these metabolically important proteins
Convolutional LSTM Networks for Subcellular Localization of Proteins
Machine learning is widely used to analyze biological sequence data.
Non-sequential models such as SVMs or feed-forward neural networks are often
used although they have no natural way of handling sequences of varying length.
Recurrent neural networks such as the long short term memory (LSTM) model on
the other hand are designed to handle sequences. In this study we demonstrate
that LSTM networks predict the subcellular location of proteins given only the
protein sequence with high accuracy (0.902) outperforming current state of the
art algorithms. We further improve the performance by introducing convolutional
filters and experiment with an attention mechanism which lets the LSTM focus on
specific parts of the protein. Lastly we introduce new visualizations of both
the convolutional filters and the attention mechanisms and show how they can be
used to extract biological relevant knowledge from the LSTM networks
Microbiome profiling by Illumina sequencing of combinatorial sequence-tagged PCR products
We developed a low-cost, high-throughput microbiome profiling method that
uses combinatorial sequence tags attached to PCR primers that amplify the rRNA
V6 region. Amplified PCR products are sequenced using an Illumina paired-end
protocol to generate millions of overlapping reads. Combinatorial sequence
tagging can be used to examine hundreds of samples with far fewer primers than
is required when sequence tags are incorporated at only a single end. The
number of reads generated permitted saturating or near-saturating analysis of
samples of the vaginal microbiome. The large number of reads al- lowed an
in-depth analysis of errors, and we found that PCR-induced errors composed the
vast majority of non-organism derived species variants, an ob- servation that
has significant implications for sequence clustering of similar high-throughput
data. We show that the short reads are sufficient to assign organisms to the
genus or species level in most cases. We suggest that this method will be
useful for the deep sequencing of any short nucleotide region that is
taxonomically informative; these include the V3, V5 regions of the bac- terial
16S rRNA genes and the eukaryotic V9 region that is gaining popularity for
sampling protist diversity.Comment: 28 pages, 13 figure
Functional significance may underlie the taxonomic utility of single amino acid substitutions in conserved proteins
We hypothesized that some amino acid substitutions in conserved proteins that are strongly fixed by critical functional roles would show lineage-specific distributions. As an example of an archetypal conserved eukaryotic protein we considered the active site of ß-tubulin. Our analysis identified one amino acid substitution—ß-tubulin F224—which was highly lineage specific. Investigation of ß-tubulin for other phylogenetically restricted amino acids identified several with apparent specificity for well-defined phylogenetic groups. Intriguingly, none showed specificity for “supergroups” other than the unikonts. To understand why, we analysed the ß-tubulin Neighbor-Net and demonstrated a fundamental division between core ß-tubulins (plant-like) and divergent ß-tubulins (animal and fungal). F224 was almost completely restricted to the core ß-tubulins, while divergent ß-tubulins possessed Y224. Thus, our specific example offers insight into the restrictions associated with the co-evolution of ß-tubulin during the radiation of eukaryotes, underlining a fundamental dichotomy between F-type, core ß-tubulins and Y-type, divergent ß-tubulins. More broadly our study provides proof of principle for the taxonomic utility of critical amino acids in the active sites of conserved proteins
Impact of shortened crop rotation of oilseed rape on soil and rhizosphere microbial diversity in relation to yield decline
Oilseed rape (OSR) grown in monoculture shows a decline in yield relative to virgin OSR of up to 25%, but the mechanisms responsible are unknown. A long term field experiment of OSR grown in a range of rotations with wheat was used to determine whether shifts in fungal and bacterial populations of the rhizosphere and bulk soil were associated with the development of OSR yield decline. The communities of fungi and bacteria in the rhizosphere and bulk soil from the field experiment were profiled using terminal restriction fragment length polymorphism (TRFLP) and sequencing of cloned internal transcribed spacer regions and 16S rRNA genes, respectively. OSR cropping frequency had no effect on rhizosphere bacterial communities. However, the rhizosphere fungal communities from continuously grown OSR were significantly different to those from other rotations. This was due primarily to an increase in abundance of two fungi which showed 100% and 95% DNA identity to the plant pathogens Olpidium brassicae and Pyrenochaeta lycopersici, respectively. Real-time PCR confirmed that there was significantly more of these fungi in the continuously grown OSR than the other rotations. These two fungi were isolated from the field and used to inoculate OSR and Brassica oleracea grown under controlled conditions in a glasshouse to determine their effect on yield. At high doses, Olpidium brassicae reduced top growth and root biomass in seedlings and reduced branching and subsequent pod and seed production. Pyrenochaeta sp. formed lesions on the roots of seedlings, and at high doses delayed flowering and had a negative impact on seed quantity and quality
Low-Bandwidth and Non-Compute Intensive Remote Identification of Microbes from Raw Sequencing Reads
Cheap high-throughput DNA sequencing may soon become routine not only for
human genomes but also for practically anything requiring the identification of
living organisms from their DNA: tracking of infectious agents, control of food
products, bioreactors, or environmental samples.
We propose a novel general approach to the analysis of sequencing data in
which the reference genome does not have to be specified. Using a distributed
architecture we are able to query a remote server for hints about what the
reference might be, transferring a relatively small amount of data, and the
hints can be used for more computationally-demanding work.
Our system consists of a server with known reference DNA indexed, and a
client with raw sequencing reads. The client sends a sample of unidentified
reads, and in return receives a list of matching references known to the
server. Sequences for the references can be retrieved and used for exhaustive
computation on the reads, such as alignment.
To demonstrate this approach we have implemented a web server, indexing tens
of thousands of publicly available genomes and genomic regions from various
organisms and returning lists of matching hits from query sequencing reads. We
have also implemented two clients, one of them running in a web browser, in
order to demonstrate that gigabytes of raw sequencing reads of unknown origin
could be identified without the need to transfer a very large volume of data,
and on modestly powered computing devices.
A web access is available at http://tapir.cbs.dtu.dk. The source code for a
python command-line client, a server, and supplementary data is available at
http://bit.ly/1aURxkc
Genetic and biochemical analyses of chromosome and plasmid gene homologues encoding ICL and ArCP domains in Vibrioanguillarum strain 775
Anguibactin, the siderophore produced by Vibrio anguillarum 775 is synthesized from 2,3-dihydroxybenzoic acid (DHBA), cysteine and hydroxyhistamine via a nonribosomal peptide synthetase (NRPS) mechanism. Most of the genes encoding anguibactin biosynthetic proteins are harbored by the pJM1 plasmid. In this work we report the identification of a homologue of the plasmid-encoded angB on the chromosome of strain 775. The product of both genes harbor an isochorismate lyase (ICL) domain that converts isochorismic acid to 2,3-dihydro-2,3-dihydroxybenzoic acid, one of the steps of DHBA synthesis. We show in this work that both ICL domains are functional in the production of DHBA in V. anguillarum as well as in E. coli. Substitution by alanine of the aspartic acid residue in the active site of both ICL domains completely abolishes their isochorismate lyase activity in vivo. The two proteins also carry an aryl carrier protein (ArCP) domain. In contrast with the ICL domains only the plasmid encoded ArCP can participate in anguibactin production as determined by complementation analyses and site-directed mutagenesis in the active site of the plasmid encoded protein, S248A. The site-directed mutants, D37A in the ICL domain and S248A in the ArCP domain of the plasmid encoded AngB were also tested in vitro and clearly show the importance of each residue for the domain function and that each domain operates independently.
Revealing natural relationships among arbuscular mycorrhizal fungi: culture line BEG47 represents Diversispora epigaea, not Glomus versiforme
Background: Understanding the mechanisms underlying biological phenomena, such as evolutionarily conservative trait inheritance, is predicated on knowledge of the natural relationships among organisms. However, despite their enormous ecological significance, many of the ubiquitous soil inhabiting and plant symbiotic arbuscular mycorrhizal fungi (AMF, phylum Glomeromycota) are incorrectly classified.
Methodology/Principal Findings:
Here, we focused on a frequently used model AMF registered as culture BEG47. This fungus is a descendent of the ex-type culture-lineage of Glomus epigaeum, which in 1983 was synonymised with Glomus versiforme. It has since then been used as ‘G. versiforme BEG47’. We show by morphological comparisons, based on type material, collected 1860–61, of G. versiforme and on type material and living ex-type cultures of G. epigaeum, that these two AMF species cannot be conspecific, and by molecular phylogenetics that BEG47 is a member of the genus Diversispora.
Conclusions: This study highlights that experimental works published during the last >25 years on an AMF named ‘G. versiforme’ or ‘BEG47’ refer to D. epigaea, a species that is actually evolutionarily separated by hundreds of millions of years from all members of the genera in the Glomerales and thus from most other commonly used AMF ‘laboratory strains’. Detailed redescriptions substantiate the renaming of G. epigaeum (BEG47) as D. epigaea, positioning it systematically in the order Diversisporales, thus enabling an evolutionary understanding of genetical, physiological, and ecological traits, relative to those of other AMF. Diversispora epigaea is widely cultured as a laboratory strain of AMF, whereas G. versiforme appears not to have been cultured nor found in the field since its original description
- …
