455 research outputs found
Simcluster: clustering enumeration gene expression data on the simplex space
Transcript enumeration methods such as SAGE, MPSS, and sequencing-by-synthesis EST "digital northern", are important high-throughput techniques for digital gene expression measurement. As other counting or voting processes, these measurements constitute compositional data exhibiting properties particular to the simplex space where the summation of the components is constrained. These properties are not present on regular Euclidean spaces, on which hybridization-based microarray data is often modeled. Therefore, pattern recognition methods commonly used for microarray data analysis may be non-informative for the data generated by transcript enumeration techniques since they ignore certain fundamental properties of this space.

Here we present a software tool, Simcluster, designed to perform clustering analysis for data on the simplex space. We present Simcluster as a stand-alone command-line C package and as a user-friendly on-line tool. Both versions are available at: http://xerad.systemsbiology.net/simcluster.

Simcluster is designed in accordance with a well-established mathematical framework for compositional data analysis, which provides principled procedures for dealing with the simplex space, and is thus applicable in a number of contexts, including enumeration-based gene expression data
Amino Acid Usage Is Asymmetrically Biased in AT- and GC-Rich Microbial Genomes.
INTRODUCTION: Genomic base composition ranges from less than 25% AT to more than 85% AT in prokaryotes. Since only a small fraction of prokaryotic genomes is not protein coding even a minor change in genomic base composition will induce profound protein changes. We examined how amino acid and codon frequencies were distributed in over 2000 microbial genomes and how these distributions were affected by base compositional changes. In addition, we wanted to know how genome-wide amino acid usage was biased in the different genomes and how changes to base composition and mutations affected this bias. To carry this out, we used a Generalized Additive Mixed-effects Model (GAMM) to explore non-linear associations and strong data dependences in closely related microbes; principal component analysis (PCA) was used to examine genomic amino acid- and codon frequencies, while the concept of relative entropy was used to analyze genomic mutation rates. RESULTS: We found that genomic amino acid frequencies carried a stronger phylogenetic signal than codon frequencies, but that this signal was weak compared to that of genomic %AT. Further, in contrast to codon usage bias (CUB), amino acid usage bias (AAUB) was differently distributed in AT- and GC-rich genomes in the sense that AT-rich genomes did not prefer specific amino acids over others to the same extent as GC-rich genomes. AAUB was also associated with relative entropy; genomes with low AAUB contained more random mutations as a consequence of relaxed purifying selection than genomes with higher AAUB. CONCLUSION: Genomic base composition has a substantial effect on both amino acid- and codon frequencies in bacterial genomes. While phylogeny influenced amino acid usage more in GC-rich genomes, AT-content was driving amino acid usage in AT-rich genomes. We found the GAMM model to be an excellent tool to analyze the genomic data used in this study
Development and quantitative analyses of a universal rRNA-subtraction protocol for microbial metatranscriptomics
Metatranscriptomes generated by pyrosequencing hold significant potential for describing functional processes in complex microbial communities. Meeting this potential requires protocols that maximize mRNA recovery by reducing the relative abundance of ribosomal RNA, as well as systematic comparisons to identify methodological artifacts and test for reproducibility across data sets. Here, we implement a protocol for subtractive hybridization of bacterial rRNA (16S and 23S) that uses sample-specific probes and is applicable across diverse environmental samples. To test this method, rRNA-subtracted and unsubtracted transcriptomes were sequenced (454 FLX technology) from bacterioplankton communities at two depths in the oligotrophic open ocean, yielding 10 data sets representing ~350 Mbp. Subtractive hybridization reduced bacterial rRNA transcript abundance by 40–58%, increasing recovery of non-rRNA sequences up to fourfold (from 12% to 20% of total sequences to 40–49%). In testing this method, we established criteria for detecting sequences replicated artificially via pyrosequencing errors and identified such replicates as a significant component (6–39%) of total pyrosequencing reads. Following replicate removal, statistical comparisons of reference genes (identified via BLASTX to NCBI-nr) between technical replicates and between rRNA-subtracted and unsubtracted samples showed low levels of differential transcript abundance (<0.2% of reference genes). However, gene overlap between data sets was remarkably low, with no two data sets (including duplicate runs from the same pyrosequencing library template) sharing greater than 17% of unique reference genes. These results indicate that pyrosequencing captures a small subset of total mRNA diversity and underscores the importance of reliable rRNA subtraction procedures to enhance sequencing coverage across the functional transcript pool.Agouron InstituteGordon and Betty Moore FoundationUnited States. Dept. of Energy. Office of ScienceNational Science Foundation (U.S.) (NSF Science and Technology Center Award EF0424599
Molecular mechanisms of toxicity of silver nanoparticles in zebrafish embryos.
addresses: Biosciences, College of Life and Environmental Sciences, Geoffrey Pope Building, University of Exeter, Stocker Road, Exeter, EX4 4QD, UK. [email protected]: Journal Article; Research Support, Non-U.S. Gov'tThis is an open access article that is freely available in ORE or from the publisher's web site. http://pubs.acs.org/doi/abs/10.1021/es401758d. Please cite the published version© 2013 American Chemical SocietySupporting Information:
Further details on the methodology and results for the
characterization of the silver particles used for the exposures,
mortality curves, sequencing analysis, and a number of
supporting figures and tables. This material is available free of
charge via the Internet at http://pubs.acs.org.Silver nanoparticles cause toxicity in exposed organisms and are an environmental health concern. The mechanisms of silver nanoparticle toxicity, however, remain unclear. We examined the effects of exposure to silver in nano-, bulk-, and ionic forms on zebrafish embryos (Danio rerio) using a Next Generation Sequencing approach in an Illumina platform (High-Throughput SuperSAGE). Significant alterations in gene expression were found for all treatments and many of the gene pathways affected, most notably those associated with oxidative phosphorylation and protein synthesis, overlapped strongly between the three treatments indicating similar mechanisms of toxicity for the three forms of silver studied. Changes in oxidative phosphorylation indicated a down-regulation of this pathway at 24 h of exposure, but with a recovery at 48 h. This finding was consistent with a dose-dependent decrease in oxygen consumption at 24 h, but not at 48 h, following exposure to silver ions. Overall, our data provide support for the hypothesis that the toxicity caused by silver nanoparticles is principally associated with bioavailable silver ions in exposed zebrafish embryos. These findings are important in the evaluation of the risk that silver particles may pose to exposed vertebrate organisms.Natural Environment Research Council (NERC)NERC Biomolecular Analysis FacilityUK Environment AgencySystems Biology Seed fund, University of Exete
The Transcriptional Response to DNA-Double-Strand Breaks in Physcomitrella patens
The model bryophyte Physcomitrella patens is unique among plants in supporting the generation of mutant alleles by facile homologous recombination-mediated gene targeting (GT). Reasoning that targeted transgene integration occurs through the capture of transforming DNA by the homology-dependent pathway for DNA double-strand break (DNA-DSB) repair, we analysed the genome-wide transcriptomic response to bleomycin-induced DNA damage and generated mutants in candidate DNA repair genes. Massively parallel (Illumina) cDNA sequencing identified potential participants in gene targeting. Transcripts encoding DNA repair proteins active in multiple repair pathways were significantly up-regulated. These included Rad51, CtIP, DNA ligase 1, Replication protein A and ATR in homology-dependent repair, Xrcc4, DNA ligase 4, Ku70 and Ku80 in non-homologous end-joining and Rad1, Tebichi/polymerase theta, PARP in microhomology-mediated end-joining. Differentially regulated cell-cycle components included up-regulated Rad9 and Hus1 DNA-damage-related checkpoint proteins and down-regulated D-type cyclins and B-type CDKs, commensurate with the imposition of a checkpoint at G2 of the cell cycle characteristic of homology-dependent DNA-DSB repair. Candidate genes, including ATP-dependent chromatin remodelling helicases associated with repair and recombination, were knocked out and analysed for growth defects, hypersensitivity to DNA damage and reduced GT efficiency. Targeted knockout of PpCtIP, a cell-cycle activated mediator of homology-dependent DSB resection, resulted in bleomycin-hypersensitivity and greatly reduced GT efficiency
An EMT-Driven Alternative Splicing Program Occurs in Human Breast Cancer and Modulates Cellular Phenotype
Epithelial-mesenchymal transition (EMT), a mechanism important for embryonic development, plays a critical role during malignant transformation. While much is known about transcriptional regulation of EMT, alternative splicing of several genes has also been correlated with EMT progression, but the extent of splicing changes and their contributions to the morphological conversion accompanying EMT have not been investigated comprehensively. Using an established cell culture model and RNA–Seq analyses, we determined an alternative splicing signature for EMT. Genes encoding key drivers of EMT–dependent changes in cell phenotype, such as actin cytoskeleton remodeling, regulation of cell–cell junction formation, and regulation of cell migration, were enriched among EMT–associated alternatively splicing events. Our analysis suggested that most EMT–associated alternative splicing events are regulated by one or more members of the RBFOX, MBNL, CELF, hnRNP, or ESRP classes of splicing factors. The EMT alternative splicing signature was confirmed in human breast cancer cell lines, which could be classified into basal and luminal subtypes based exclusively on their EMT–associated splicing pattern. Expression of EMT–associated alternative mRNA transcripts was also observed in primary breast cancer samples, indicating that EMT–dependent splicing changes occur commonly in human tumors. The functional significance of EMT–associated alternative splicing was tested by expression of the epithelial-specific splicing factor ESRP1 or by depletion of RBFOX2 in mesenchymal cells, both of which elicited significant changes in cell morphology and motility towards an epithelial phenotype, suggesting that splicing regulation alone can drive critical aspects of EMT–associated phenotypic changes. The molecular description obtained here may aid in the development of new diagnostic and prognostic markers for analysis of breast cancer progression.National Institutes of Health (U.S.) (R01-HG002439)National Science Foundation (U.S.) (equipment grant)National Institutes of Health (U.S.) (Integrative Cancer Biology Program Grant U54-CA112967)David H. Koch Institute for Integrative Cancer Research at MIT (Ludwig Center for Metastasis Research)David H. Koch Institute for Integrative Cancer Research at MITMassachusetts Institute of Technology (Croucher Scholarship)Massachusetts Institute of Technology (Ludwig Fund postdoctoral fellowship)National Institutes of Health (U.S.) (NIH CA100324)National Institutes of Health (U.S.) (AECC9526-5267
Metathesis of Fatty Acid Ester Derivatives in 1,1-Dialkyl and 1,2,3-Trialkyl Imidazolium Type Ionic Liquids
The self-metathesis of methyl oleate and methyl ricinoleate was carried out in the presence of ruthenium alkylidene catalysts 1–4 in [bmim] and [bdmim][X] type ionic liquids (RTILs) (X = PF6−, BF4− and NTf2−) using the gas chromatographic technique. Best catalytic performance was obtained in [bdmim][X] type ionic liquids when compared with [bmim][X] type ionic liquids. Catalyst recycling studies were also carried out in the room temperature ionic liquids (RTILs) with catalysts 1–4 in order to explore their possible industrial application
The Genome of Borrelia recurrentis, the Agent of Deadly Louse-Borne Relapsing Fever, Is a Degraded Subset of Tick-Borne Borrelia duttonii
In an effort to understand how a tick-borne pathogen adapts to the body louse, we sequenced and compared the genomes of the recurrent fever agents Borrelia recurrentis and B. duttonii. The 1,242,163–1,574,910-bp fragmented genomes of B. recurrentis and B. duttonii contain a unique 23-kb linear plasmid. This linear plasmid exhibits a large polyT track within the promoter region of an intact variable large protein gene and a telomere resolvase that is unique to Borrelia. The genome content is characterized by several repeat families, including antigenic lipoproteins. B. recurrentis exhibited a 20.4% genome size reduction and appeared to be a strain of B. duttonii, with a decaying genome, possibly due to the accumulation of genomic errors induced by the loss of recA and mutS. Accompanying this were increases in the number of impaired genes and a reduction in coding capacity, including surface-exposed lipoproteins and putative virulence factors. Analysis of the reconstructed ancestral sequence compared to B. duttonii and B. recurrentis was consistent with the accelerated evolution observed in B. recurrentis. Vector specialization of louse-borne pathogens responsible for major epidemics was associated with rapid genome reduction. The correlation between gene loss and increased virulence of B. recurrentis parallels that of Rickettsia prowazekii, with both species being genomic subsets of less-virulent strains
PatternLab for proteomics: a tool for differential shotgun proteomics
<p>Abstract</p> <p>Background</p> <p>A goal of proteomics is to distinguish between states of a biological system by identifying protein expression differences. Liu <it>et al</it>. demonstrated a method to perform semi-relative protein quantitation in shotgun proteomics data by correlating the number of tandem mass spectra obtained for each protein, or "spectral count", with its abundance in a mixture; however, two issues have remained open: how to normalize spectral counting data and how to efficiently pinpoint differences between profiles. Moreover, Chen <it>et al</it>. recently showed how to increase the number of identified proteins in shotgun proteomics by analyzing samples with different MS-compatible detergents while performing proteolytic digestion. The latter introduced new challenges as seen from the data analysis perspective, since replicate readings are not acquired.</p> <p>Results</p> <p>To address the open issues above, we present a program termed PatternLab for proteomics. This program implements existing strategies and adds two new methods to pinpoint differences in protein profiles. The first method, ACFold, addresses experiments with less than three replicates from each state or having assays acquired by different protocols as described by Chen <it>et al</it>. ACFold uses a combined criterion based on expression fold changes, the AC test, and the false-discovery rate, and can supply a "bird's-eye view" of differentially expressed proteins. The other method addresses experimental designs having multiple readings from each state and is referred to as nSVM (natural support vector machine) because of its roots in evolutionary computing and in statistical learning theory. Our observations suggest that nSVM's niche comprises projects that select a minimum set of proteins for classification purposes; for example, the development of an early detection kit for a given pathology. We demonstrate the effectiveness of each method on experimental data and confront them with existing strategies.</p> <p>Conclusion</p> <p>PatternLab offers an easy and unified access to a variety of feature selection and normalization strategies, each having its own niche. Additionally, graphing tools are available to aid in the analysis of high throughput experimental data. PatternLab is available at <url>http://pcarvalho.com/patternlab</url>.</p
GO Explorer: A gene-ontology tool to aid in the interpretation of shotgun proteomics data
<p>Abstract</p> <p>Background</p> <p>Spectral counting is a shotgun proteomics approach comprising the identification and relative quantitation of thousands of proteins in complex mixtures. However, this strategy generates bewildering amounts of data whose biological interpretation is a challenge.</p> <p>Results</p> <p>Here we present a new algorithm, termed GO Explorer (GOEx), that leverages the gene ontology (GO) to aid in the interpretation of proteomic data. GOEx stands out because it combines data from protein fold changes with GO over-representation statistics to help draw conclusions. Moreover, it is tightly integrated within the PatternLab for Proteomics project and, thus, lies within a complete computational environment that provides parsers and pattern recognition tools designed for spectral counting. GOEx offers three independent methods to query data: an interactive directed acyclic graph, a specialist mode where key words can be searched, and an automatic search. Its usefulness is demonstrated by applying it to help interpret the effects of perillyl alcohol, a natural chemotherapeutic agent, on glioblastoma multiform cell lines (A172). We used a new multi-surfactant shotgun proteomic strategy and identified more than 2600 proteins; GOEx pinpointed key sets of differentially expressed proteins related to cell cycle, alcohol catabolism, the Ras pathway, apoptosis, and stress response, to name a few.</p> <p>Conclusion</p> <p>GOEx facilitates organism-specific studies by leveraging GO and providing a rich graphical user interface. It is a simple to use tool, specialized for biologists who wish to analyze spectral counting data from shotgun proteomics. GOEx is available at <url>http://pcarvalho.com/patternlab</url>.</p
- …
