192 research outputs found
Bayesian meta-analysis for identifying periodically expressed genes in fission yeast cell cycle
The effort to identify genes with periodic expression during the cell cycle
from genome-wide microarray time series data has been ongoing for a decade.
However, the lack of rigorous modeling of periodic expression as well as the
lack of a comprehensive model for integrating information across genes and
experiments has impaired the effort for the accurate identification of
periodically expressed genes. To address the problem, we introduce a Bayesian
model to integrate multiple independent microarray data sets from three recent
genome-wide cell cycle studies on fission yeast. A hierarchical model was used
for data integration. In order to facilitate an efficient Monte Carlo sampling
from the joint posterior distribution, we develop a novel Metropolis--Hastings
group move. A surprising finding from our integrated analysis is that more than
40% of the genes in fission yeast are significantly periodically expressed,
greatly enhancing the reported 10--15% of the genes in the current literature.
It calls for a reconsideration of the periodically expressed gene detection
problem.Comment: Published in at http://dx.doi.org/10.1214/09-AOAS300 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Matching Phosphorylation Response Patterns of Antigen-Receptor-Stimulated T Cells Via Flow Cytometry
Background
When flow cytometric data on mixtures of cell populations are collected from samples under different experimental conditions, computational methods are needed (a) to classify the samples into similar groups, and (b) to characterize the changes within the corresponding populations due to the different conditions. Manual inspection has been used in the past to study such changes, but high-dimensional experiments necessitate developing new computational approaches to this problem. A robust solution to this problem is to construct distinct templates to summarize all samples from a class, and then to compare these templates to study the changes across classes or conditions. Results
We designed a hierarchical algorithm, flowMatch, to first match the corresponding clusters across samples for producing robust meta-clusters, and to then construct a high-dimensional template as a collection of meta-clusters for each class of samples. We applied the algorithm on flow cytometry data obtained from human blood cells before and after stimulation with anti-CD3 monoclonal antibody, which is reported to change phosphorylation responses of memory and naive T cells. TheflowMatch algorithm is able to construct representative templates from the samples before and after stimulation, and to match corresponding meta-clusters across templates. The templates of the pre-stimulation and post-stimulation data corresponding to memory and naive T cell populations clearly show, at the level of the meta-clusters, the overall phosphorylation shift due to the stimulation. Conclusions
We concisely represent each class of samples by a template consisting of a collection of meta-clusters (representative abstract populations). Using flowMatch, the meta-clusters across samples can be matched to assess overall differences among the samples of various phenotypes or time-points
Joint Modeling and Registration of Cell Populations in Cohorts of High-Dimensional Flow Cytometric Data
In systems biomedicine, an experimenter encounters different potential
sources of variation in data such as individual samples, multiple experimental
conditions, and multi-variable network-level responses. In multiparametric
cytometry, which is often used for analyzing patient samples, such issues are
critical. While computational methods can identify cell populations in
individual samples, without the ability to automatically match them across
samples, it is difficult to compare and characterize the populations in typical
experiments, such as those responding to various stimulations or distinctive of
particular patients or time-points, especially when there are many samples.
Joint Clustering and Matching (JCM) is a multi-level framework for simultaneous
modeling and registration of populations across a cohort. JCM models every
population with a robust multivariate probability distribution. Simultaneously,
JCM fits a random-effects model to construct an overall batch template -- used
for registering populations across samples, and classifying new samples. By
tackling systems-level variation, JCM supports practical biomedical
applications involving large cohorts
Clustering with position-specific constraints on variance: Applying redescending M-estimators to label-free LC-MS data analysis
Background
Clustering is a widely applicable pattern recognition method for discovering groups of similar observations in data. While there are a large variety of clustering algorithms, very few of these can enforce constraints on the variation of attributes for data points included in a given cluster. In particular, a clustering algorithm that can limit variation within a cluster according to that cluster's position (centroid location) can produce effective and optimal results in many important applications ranging from clustering of silicon pixels or calorimeter cells in high-energy physics to label-free liquid chromatography based mass spectrometry (LC-MS) data analysis in proteomics and metabolomics.
Results
We present MEDEA (M-Estimator with DEterministic Annealing), an M-estimator based, new unsupervised algorithm that is designed to enforce position-specific constraints on variance during the clustering process. The utility of MEDEA is demonstrated by applying it to the problem of "peak matching"--identifying the common LC-MS peaks across multiple samples--in proteomic biomarker discovery. Using real-life datasets, we show that MEDEA not only outperforms current state-of-the-art model-based clustering methods, but also results in an implementation that is significantly more efficient, and hence applicable to much larger LC-MS data sets.
Conclusions
MEDEA is an effective and efficient solution to the problem of peak matching in label-free LC-MS data. The program implementing the MEDEA algorithm, including datasets, clustering results, and supplementary information is available from the author website at http://www.hephy.at/user/fru/medea/
Laser Microdissection of the Alveolar Duct Enables Single-Cell Genomic Analysis
Complex tissues such as the lung are composed of structural hierarchies such as alveoli, alveolar ducts, and lobules. Some structural units, such as the alveolar duct, appear to participate in tissue repair as well as the development of bronchioalveolar carcinoma. Here, we demonstrate an approach to conduct laser microdissection of the lung alveolar duct for single-cell PCR analysis. Our approach involved three steps. (1) The initial preparation used mechanical sectioning of the lung tissue with sufficient thickness to encompass the structure of interest. In the case of the alveolar duct, the precision-cut lung slices were 200 μm thick; the slices were processed using near-physiologic conditions to preserve the state of viable cells. (2) The lung slices were examined by transmission light microscopy to target the alveolar duct. The air-filled lung was sufficiently accessible by light microscopy that counterstains or fluorescent labels were unnecessary to identify the alveolar duct. (3) The enzymatic and microfluidic isolation of single cells allowed for the harvest of as few as several thousand cells for PCR analysis. Microfluidics based arrays were used to measure the expression of selected marker genes in individual cells to characterize different cell populations. Preliminary work suggests the unique value of this approach to understand the intra- and intercellular interactions within the regenerating alveolar duct
On the Probabilities of Environmental Extremes
Environmental researchers, as well as epidemiologists, often encounter the problem of determining the probability of exceeding a high threshold of a variable of interest based on observations that are much smaller than the threshold. Moreover, the data available for that task may only be of moderate size. This generic problem is addressed by repeatedly fusing the real data numerous times with synthetic computer-generated samples. The threshold probability of interest is approximated by certain subsequences created by an iterative algorithm that gives precise estimates. The method is illustrated using environmental data including monitoring data of nitrogen dioxide levels in the air
- …
