192 research outputs found

    Bayesian meta-analysis for identifying periodically expressed genes in fission yeast cell cycle

    Full text link
    The effort to identify genes with periodic expression during the cell cycle from genome-wide microarray time series data has been ongoing for a decade. However, the lack of rigorous modeling of periodic expression as well as the lack of a comprehensive model for integrating information across genes and experiments has impaired the effort for the accurate identification of periodically expressed genes. To address the problem, we introduce a Bayesian model to integrate multiple independent microarray data sets from three recent genome-wide cell cycle studies on fission yeast. A hierarchical model was used for data integration. In order to facilitate an efficient Monte Carlo sampling from the joint posterior distribution, we develop a novel Metropolis--Hastings group move. A surprising finding from our integrated analysis is that more than 40% of the genes in fission yeast are significantly periodically expressed, greatly enhancing the reported 10--15% of the genes in the current literature. It calls for a reconsideration of the periodically expressed gene detection problem.Comment: Published in at http://dx.doi.org/10.1214/09-AOAS300 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Matching Phosphorylation Response Patterns of Antigen-Receptor-Stimulated T Cells Via Flow Cytometry

    Get PDF
    Background When flow cytometric data on mixtures of cell populations are collected from samples under different experimental conditions, computational methods are needed (a) to classify the samples into similar groups, and (b) to characterize the changes within the corresponding populations due to the different conditions. Manual inspection has been used in the past to study such changes, but high-dimensional experiments necessitate developing new computational approaches to this problem. A robust solution to this problem is to construct distinct templates to summarize all samples from a class, and then to compare these templates to study the changes across classes or conditions. Results We designed a hierarchical algorithm, flowMatch, to first match the corresponding clusters across samples for producing robust meta-clusters, and to then construct a high-dimensional template as a collection of meta-clusters for each class of samples. We applied the algorithm on flow cytometry data obtained from human blood cells before and after stimulation with anti-CD3 monoclonal antibody, which is reported to change phosphorylation responses of memory and naive T cells. TheflowMatch algorithm is able to construct representative templates from the samples before and after stimulation, and to match corresponding meta-clusters across templates. The templates of the pre-stimulation and post-stimulation data corresponding to memory and naive T cell populations clearly show, at the level of the meta-clusters, the overall phosphorylation shift due to the stimulation. Conclusions We concisely represent each class of samples by a template consisting of a collection of meta-clusters (representative abstract populations). Using flowMatch, the meta-clusters across samples can be matched to assess overall differences among the samples of various phenotypes or time-points

    Joint Modeling and Registration of Cell Populations in Cohorts of High-Dimensional Flow Cytometric Data

    Get PDF
    In systems biomedicine, an experimenter encounters different potential sources of variation in data such as individual samples, multiple experimental conditions, and multi-variable network-level responses. In multiparametric cytometry, which is often used for analyzing patient samples, such issues are critical. While computational methods can identify cell populations in individual samples, without the ability to automatically match them across samples, it is difficult to compare and characterize the populations in typical experiments, such as those responding to various stimulations or distinctive of particular patients or time-points, especially when there are many samples. Joint Clustering and Matching (JCM) is a multi-level framework for simultaneous modeling and registration of populations across a cohort. JCM models every population with a robust multivariate probability distribution. Simultaneously, JCM fits a random-effects model to construct an overall batch template -- used for registering populations across samples, and classifying new samples. By tackling systems-level variation, JCM supports practical biomedical applications involving large cohorts

    Clustering with position-specific constraints on variance: Applying redescending M-estimators to label-free LC-MS data analysis

    Get PDF
    Background Clustering is a widely applicable pattern recognition method for discovering groups of similar observations in data. While there are a large variety of clustering algorithms, very few of these can enforce constraints on the variation of attributes for data points included in a given cluster. In particular, a clustering algorithm that can limit variation within a cluster according to that cluster's position (centroid location) can produce effective and optimal results in many important applications ranging from clustering of silicon pixels or calorimeter cells in high-energy physics to label-free liquid chromatography based mass spectrometry (LC-MS) data analysis in proteomics and metabolomics. Results We present MEDEA (M-Estimator with DEterministic Annealing), an M-estimator based, new unsupervised algorithm that is designed to enforce position-specific constraints on variance during the clustering process. The utility of MEDEA is demonstrated by applying it to the problem of "peak matching"--identifying the common LC-MS peaks across multiple samples--in proteomic biomarker discovery. Using real-life datasets, we show that MEDEA not only outperforms current state-of-the-art model-based clustering methods, but also results in an implementation that is significantly more efficient, and hence applicable to much larger LC-MS data sets. Conclusions MEDEA is an effective and efficient solution to the problem of peak matching in label-free LC-MS data. The program implementing the MEDEA algorithm, including datasets, clustering results, and supplementary information is available from the author website at http://www.hephy.at/user/fru/medea/

    Laser Microdissection of the Alveolar Duct Enables Single-Cell Genomic Analysis

    Get PDF
    Complex tissues such as the lung are composed of structural hierarchies such as alveoli, alveolar ducts, and lobules. Some structural units, such as the alveolar duct, appear to participate in tissue repair as well as the development of bronchioalveolar carcinoma. Here, we demonstrate an approach to conduct laser microdissection of the lung alveolar duct for single-cell PCR analysis. Our approach involved three steps. (1) The initial preparation used mechanical sectioning of the lung tissue with sufficient thickness to encompass the structure of interest. In the case of the alveolar duct, the precision-cut lung slices were 200 μm thick; the slices were processed using near-physiologic conditions to preserve the state of viable cells. (2) The lung slices were examined by transmission light microscopy to target the alveolar duct. The air-filled lung was sufficiently accessible by light microscopy that counterstains or fluorescent labels were unnecessary to identify the alveolar duct. (3) The enzymatic and microfluidic isolation of single cells allowed for the harvest of as few as several thousand cells for PCR analysis. Microfluidics based arrays were used to measure the expression of selected marker genes in individual cells to characterize different cell populations. Preliminary work suggests the unique value of this approach to understand the intra- and intercellular interactions within the regenerating alveolar duct

    On the Probabilities of Environmental Extremes

    Get PDF
    Environmental researchers, as well as epidemiologists, often encounter the problem of determining the probability of exceeding a high threshold of a variable of interest based on observations that are much smaller than the threshold. Moreover, the data available for that task may only be of moderate size. This generic problem is addressed by repeatedly fusing the real data numerous times with synthetic computer-generated samples. The threshold probability of interest is approximated by certain subsequences created by an iterative algorithm that gives precise estimates. The method is illustrated using environmental data including monitoring data of nitrogen dioxide levels in the air
    corecore