123 research outputs found
Recommended from our members
Biomarker discovery and redundancy reduction towards classification using a multi-factorial MALDI-TOF MS T2DM mouse model dataset
Diabetes like many diseases and biological processes is not mono-causal. On the one hand multifactorial studies with complex experimental design are required for its comprehensive analysis. On the other hand, the data from these studies often include a substantial amount of redundancy such as proteins that are typically represented by a multitude of peptides. Coping simultaneously with both complexities (experimental and technological) makes data analysis a challenge for Bioinformatics
Nonnegative principal component analysis for mass spectral serum profiles and biomarker discovery
<p>Abstract</p> <p>Background</p> <p>As a novel cancer diagnostic paradigm, mass spectroscopic serum proteomic pattern diagnostics was reported superior to the conventional serologic cancer biomarkers. However, its clinical use is not fully validated yet. An important factor to prevent this young technology to become a mainstream cancer diagnostic paradigm is that robustly identifying cancer molecular patterns from high-dimensional protein expression data is still a challenge in machine learning and oncology research. As a well-established dimension reduction technique, PCA is widely integrated in pattern recognition analysis to discover cancer molecular patterns. However, its global feature selection mechanism prevents it from capturing local features. This may lead to difficulty in achieving high-performance proteomic pattern discovery, because only features interpreting global data behavior are used to train a learning machine.</p> <p>Methods</p> <p>In this study, we develop a nonnegative principal component analysis algorithm and present a nonnegative principal component analysis based support vector machine algorithm with sparse coding to conduct a high-performance proteomic pattern classification. Moreover, we also propose a nonnegative principal component analysis based filter-wrapper biomarker capturing algorithm for mass spectral serum profiles.</p> <p>Results</p> <p>We demonstrate the superiority of the proposed algorithm by comparison with six peer algorithms on four benchmark datasets. Moreover, we illustrate that nonnegative principal component analysis can be effectively used to capture meaningful biomarkers.</p> <p>Conclusion</p> <p>Our analysis suggests that nonnegative principal component analysis effectively conduct local feature selection for mass spectral profiles and contribute to improving sensitivities and specificities in the following classification, and meaningful biomarker discovery.</p
MetaboSearch: Tool for Mass-Based Metabolite Identification Using Multiple Databases
Searching metabolites against databases according to their masses is often the first step in metabolite identification for a mass spectrometry-based untargeted metabolomics study. Major metabolite databases include Human Metabolome DataBase (HMDB), Madison Metabolomics Consortium Database (MMCD), Metlin, and LIPID MAPS. Since each one of these databases covers only a fraction of the metabolome, integration of the search results from these databases is expected to yield a more comprehensive coverage. However, the manual combination of multiple search results is generally difficult when identification of hundreds of metabolites is desired. We have implemented a web-based software tool that enables simultaneous mass-based search against the four major databases, and the integration of the results. In addition, more complete chemical identifier information for the metabolites is retrieved by cross-referencing multiple databases. The search results are merged based on IUPAC International Chemical Identifier (InChI) keys. Besides a simple list of m/z values, the software can accept the ion annotation information as input for enhanced metabolite identification. The performance of the software is demonstrated on mass spectrometry data acquired in both positive and negative ionization modes. Compared with search results from individual databases, MetaboSearch provides better coverage of the metabolome and more complete chemical identifier information. Availability: The software tool is available at http://omics.georgetown.edu/MetaboSearch.html
Acquisition of estrogen independence induces TOB1-related mechanisms supporting breast cancer cell proliferation
© 2016 Macmillan Publishers Limited.Resistance to therapies targeting the estrogen pathway remains a challenge in the treatment of estrogen receptor-positive breast cancer. To address this challenge, a systems biology approach was used. A library of small interfering RNAs targeting an estrogen receptor (ER)-and aromatase-centered network identified 46 genes that are dispensable in estrogen-dependent MCF7 cells, but are selectively required for the survival of estrogen-independent MCF7-derived cells and multiple additional estrogen-independent breast cancer cell lines. Integration of this information identified a tumor suppressor gene TOB1 as a critical determinant of estrogen-independent ER-positive breast cell survival. Depletion of TOB1 selectively promoted G1 phase arrest and sensitivity to AKT and mammalian target of rapmycin (mTOR) inhibitors in estrogen-independent cells but not in estrogen-dependent cells. Phosphoproteomic profiles from reverse-phase protein array analysis supported by mRNA profiling identified a significant signaling network reprogramming by TOB1 that differed in estrogen-sensitive and estrogen-resistant cell lines. These data support a novel function for TOB1 in mediating survival of estrogen-independent breast cancers. These studies also provide evidence for combining TOB1 inhibition and AKT/mTOR inhibition as a therapeutic strategy, with potential translational significance for the management of patients with ER-positive breast cancers
Particle Swarm Optimization with Reinforcement Learning for the Prediction of CpG Islands in the Human Genome
BACKGROUND: Regions with abundant GC nucleotides, a high CpG number, and a length greater than 200 bp in a genome are often referred to as CpG islands. These islands are usually located in the 5' end of genes. Recently, several algorithms for the prediction of CpG islands have been proposed. METHODOLOGY/PRINCIPAL FINDINGS: We propose here a new method called CPSORL to predict CpG islands, which consists of a complement particle swarm optimization algorithm combined with reinforcement learning to predict CpG islands more reliably. Several CpG island prediction tools equipped with the sliding window technique have been developed previously. However, the quality of the results seems to rely too much on the choices that are made for the window sizes, and thus these methods leave room for improvement. CONCLUSIONS/SIGNIFICANCE: Experimental results indicate that CPSORL provides results of a higher sensitivity and a higher correlation coefficient in all selected experimental contigs than the other methods it was compared to (CpGIS, CpGcluster, CpGProd and CpGPlot). A higher number of CpG islands were identified in chromosomes 21 and 22 of the human genome than with the other methods from the literature. CPSORL also achieved the highest coverage rate (3.4%). CPSORL is an application for identifying promoter and TSS regions associated with CpG islands in entire human genomic. When compared to CpGcluster, the islands predicted by CPSORL covered a larger region in the TSS (12.2%) and promoter (26.1%) region. If Alu sequences are considered, the islands predicted by CPSORL (Alu) covered a larger TSS (40.5%) and promoter (67.8%) region than CpGIS. Furthermore, CPSORL was used to verify that the average methylation density was 5.33% for CpG islands in the entire human genome
Cutaneous hypersensitivity reactions to freshwater cyanobacteria – human volunteer studies
BACKGROUND: Pruritic skin rashes associated with exposure to freshwater cyanobacteria are infrequently reported in the medical and scientific literature, mostly as anecdotal and case reports. Diagnostic dermatological investigations in humans are also infrequently described. We sought to conduct a pilot volunteer study to explore the potential for cyanobacteria to elicit hypersensitivity reactions. METHODS: A consecutive series of adult patients presenting for diagnostic skin patch testing at a hospital outpatient clinic were invited to participate. A convenience sample of volunteers matched for age and sex was also enrolled. Patches containing aqueous suspensions of various cyanobacteria at three concentrations were applied for 48 hours; dermatological assessment was made 48 hours and 96 hours after application. RESULTS: 20 outpatients and 19 reference subjects were recruited into the study. A single outpatient produced unequivocal reactions to several cyanobacteria suspensions; this subject was also the only one of the outpatient group with a diagnosis of atopic dermatitis. No subjects in the reference group developed clinically detectable skin reactions to cyanobacteria. CONCLUSION: This preliminary clinical study demonstrates that hypersensitivity reactions to cyanobacteria appear to be infrequent in both the general and dermatological outpatient populations. As cyanobacteria are widely distributed in aquatic environments, a better appreciation of risk factors, particularly with respect to allergic predisposition, may help to refine health advice given to people engaging in recreational activities where nuisance cyanobacteria are a problem
Application of multiple statistical tests to enhance mass spectrometry-based biomarker discovery
<p>Abstract</p> <p>Background</p> <p>Mass spectrometry-based biomarker discovery has long been hampered by the difficulty in reconciling lists of discriminatory peaks identified by different laboratories for the same diseases studied. We describe a multi-statistical analysis procedure that combines several independent computational methods. This approach capitalizes on the strengths of each to analyze the same high-resolution mass spectral data set to discover consensus differential mass peaks that should be robust biomarkers for distinguishing between disease states.</p> <p>Results</p> <p>The proposed methodology was applied to a pilot narcolepsy study using logistic regression, hierarchical clustering, t-test, and CART. Consensus, differential mass peaks with high predictive power were identified across three of the four statistical platforms. Based on the diagnostic accuracy measures investigated, the performance of the consensus-peak model was a compromise between logistic regression and CART, which produced better models than hierarchical clustering and t-test. However, consensus peaks confer a higher level of confidence in their ability to distinguish between disease states since they do not represent peaks that are a result of biases to a particular statistical algorithm. Instead, they were selected as differential across differing data distribution assumptions, demonstrating their true discriminatory potential.</p> <p>Conclusion</p> <p>The methodology described here is applicable to any high-resolution MALDI mass spectrometry-derived data set with minimal mass drift which is essential for peak-to-peak comparison studies. Four statistical approaches with differing data distribution assumptions were applied to the same raw data set to obtain consensus peaks that were found to be statistically differential between the two groups compared. These consensus peaks demonstrated high diagnostic accuracy when used to form a predictive model as evaluated by receiver operating characteristics curve analysis. They should demonstrate a higher discriminatory ability as they are not biased to a particular algorithm. Thus, they are prime candidates for downstream identification and validation efforts.</p
Accurate peak list extraction from proteomic mass spectra for identification and profiling studies
<p>Abstract</p> <p>Background</p> <p>Mass spectrometry is an essential technique in proteomics both to identify the proteins of a biological sample and to compare proteomic profiles of different samples. In both cases, the main phase of the data analysis is the procedure to extract the significant features from a mass spectrum. Its final output is the so-called peak list which contains the mass, the charge and the intensity of every detected biomolecule. The main steps of the peak list extraction procedure are usually preprocessing, peak detection, peak selection, charge determination and monoisotoping operation.</p> <p>Results</p> <p>This paper describes an original algorithm for peak list extraction from low and high resolution mass spectra. It has been developed principally to improve the precision of peak extraction in comparison to other reference algorithms. It contains many innovative features among which a sophisticated method for managing the overlapping isotopic distributions.</p> <p>Conclusions</p> <p>The performances of the basic version of the algorithm and of its optional functionalities have been evaluated in this paper on both SELDI-TOF, MALDI-TOF and ESI-FTICR ECD mass spectra. Executable files of MassSpec, a MATLAB implementation of the peak list extraction procedure for Windows and Linux systems, can be downloaded free of charge for nonprofit institutions from the following web site: <url>http://aimed11.unipv.it/MassSpec</url></p
- …
