61 research outputs found

    Gene set analysis exploiting the topology of a pathway

    Get PDF
    BACKGROUND: Recently, a great effort in microarray data analysis is directed towards the study of the so-called gene sets. A gene set is defined by genes that are, somehow, functionally related. For example, genes appearing in a known biological pathway naturally define a gene set. The gene sets are usually identified from a priori biological knowledge. Nowadays, many bioinformatics resources store such kind of knowledge (see, for example, the Kyoto Encyclopedia of Genes and Genomes, among others). Although pathways maps carry important information about the structure of correlation among genes that should not be neglected, the currently available multivariate methods for gene set analysis do not fully exploit it. RESULTS: We propose a novel gene set analysis specifically designed for gene sets defined by pathways. Such analysis, based on graphical models, explicitly incorporates the dependence structure among genes highlighted by the topology of pathways. The analysis is designed to be used for overall surveillance of changes in a pathway in different experimental conditions. In fact, under different circumstances, not only the expression of the genes in a pathway, but also the strength of their relations may change. The methods resulting from the proposal allow both to test for variations in the strength of the links, and to properly account for heteroschedasticity in the usual tests for differential expression. CONCLUSIONS: The use of graphical models allows a deeper look at the components of the pathway that can be tested separately and compared marginally. In this way it is possible to test single components of the pathway and highlight only those involved in its deregulation

    Unraveling genetic predisposition to familial or early onset gastric cancer using germline whole-exome sequencing

    Get PDF
    Recognition of individuals with a genetic predisposition to gastric cancer (GC) enables preventive measures. However, the underlying cause of genetic susceptibility to gastric cancer remains largely unexplained. We performed germline whole-exome sequencing on leukocyte DNA of 54 patients from 53 families with genetically unexplained diffuse-type and intestinal-type GC to identify novel GC-predisposing candidate genes. As young age at diagnosis and familial clustering are hallmarks of genetic tumor susceptibility, we selected patients that were diagnosed below the age of 35, patients from families with two cases of GC at or below age 60 and patients from families with three GC cases at or below age 70. All included individuals were tested negative for germline CDH1 mutations before or during the study. Variants that were possibly deleterious according to in silico predictions were filtered using several independent approaches that were based on gene function and gene mutation burden in controls. Despite a rigorous search, no obvious candidate GC predisposition genes were identified. This negative result stresses the importance of future research studies in large, homogeneous cohorts

    Improving gene-set enrichment analysis of RNA-Seq data with small replicates

    Get PDF
    Deregulated pathways identified from transcriptome data of two sample groups have played a key role in many genomic studies. Gene-set enrichment analysis (GSEA) has been commonly used for pathway or functional analysis of microarray data, and it is also being applied to RNA-seq data. However, most RNA-seq data so far have only small replicates. This enforces to apply the gene-permuting GSEA method (or preranked GSEA) which results in a great number of false positives due to the inter-gene correlation in each gene-set. We demonstrate that incorporating the absolute gene statistic in one-tailed GSEA considerably improves the false-positive control and the overall discriminatory ability of the gene-permuting GSEA methods for RNA-seq data. To test the performance, a simulation method to generate correlated read counts within a gene-set was newly developed, and a dozen of currently available RNA-seq enrichment analysis methods were compared, where the proposed methods outperformed others that do not account for the inter-gene correlation. Analysis of real RNA-seq data also supported the proposed methods in terms of false positive control, ranks of true positives and biological relevance. An efficient R package (AbsFilterG- SEA) coded with C++ (Rcpp) is available from CRAN.open

    Microarray-based gene set analysis: a comparison of current methods

    Get PDF
    BACKGROUND: The analysis of gene sets has become a popular topic in recent times, with researchers attempting to improve the interpretability and reproducibility of their microarray analyses through the inclusion of supplementary biological information. While a number of options for gene set analysis exist, no consensus has yet been reached regarding which methodology performs best, and under what conditions. The goal of this work was to examine the performance characteristics of a collection of existing gene set analysis methods, on both simulated and real microarray data sets. Of particular interest was the potential utility gained through the incorporation of inter-gene correlation into the analysis process. RESULTS: Each of six gene set analysis methods was applied to both simulated and publicly available microarray data sets. Overall, the various methodologies were all found to be better at detecting gene sets that moved from non-active (i.e., genes not expressed) to active states (or vice versa), rather than those that simply changed their level of activity. Methods which incorporate correlation structures were found to provide increased ability to detect altered gene sets in some settings. CONCLUSION: Based on the results obtained through the analysis of simulated data, it is clear that the performance of gene set analysis methods is strongly influenced by the features of the data set in question, and that methods which incorporate correlation structures into the analysis process tend to achieve better performance, relative to methods which rely on univariate test statistics

    Investigating the effect of paralogs on microarray gene-set analysis

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>In order to interpret the results obtained from a microarray experiment, researchers often shift focus from analysis of individual differentially expressed genes to analyses of sets of genes. These gene-set analysis (GSA) methods use previously accumulated biological knowledge to group genes into sets and then aim to rank these gene sets in a way that reflects their relative importance in the experimental situation in question. We suspect that the presence of paralogs affects the ability of GSA methods to accurately identify the most important sets of genes for subsequent research.</p> <p>Results</p> <p>We show that paralogs, which typically have high sequence identity and similar molecular functions, also exhibit high correlation in their expression patterns. We investigate this correlation as a potential confounding factor common to current GSA methods using Indygene <url>http://www.cbio.uct.ac.za/indygene</url>, a web tool that reduces a supplied list of genes so that it includes no pairwise paralogy relationships above a specified sequence similarity threshold. We use the tool to reanalyse previously published microarray datasets and determine the potential utility of accounting for the presence of paralogs.</p> <p>Conclusions</p> <p>The Indygene tool efficiently removes paralogy relationships from a given dataset and we found that such a reduction, performed prior to GSA, has the ability to generate significantly different results that often represent novel and plausible biological hypotheses. This was demonstrated for three different GSA approaches when applied to the reanalysis of previously published microarray datasets and suggests that the redundancy and non-independence of paralogs is an important consideration when dealing with GSA methodologies.</p

    Functional Analysis: Evaluation of Response Intensities - Tailoring ANOVA for Lists of Expression Subsets

    Get PDF
    Background: Microarray data is frequently used to characterize the expression profile of a whole genome and to compare the characteristics of that genome under several conditions. Geneset analysis methods have been described previously to analyze the expression values of several genes related by known biological criteria (metabolic pathway, pathology signature, co-regulation by a common factor, etc.) at the same time and the cost of these methods allows for the use of more values to help discover the underlying biological mechanisms. Results: As several methods assume different null hypotheses, we propose to reformulate the main question that biologists seek to answer. To determine which genesets are associated with expression values that differ between two experiments, we focused on three ad hoc criteria: expression levels, the direction of individual gene expression changes (up or down regulation), and correlations between genes. We introduce the FAERI methodology, tailored from a two-way ANOVA to examine these criteria. The significance of the results was evaluated according to the self-contained null hypothesis, using label sampling or by inferring the null distribution from normally distributed random data. Evaluations performed on simulated data revealed that FAERI outperforms currently available methods for each type of set tested. We then applied the FAERI method to analyze three real-world datasets on hypoxia response. FAERI was able to detect more genesets than other methodologies, and the genesets selected were coherent with current knowledge of cellular response to hypoxia. Moreover, the genesets selected by FAERI were confirmed when the analysis was repeated on two additional related datasets. Conclusions: The expression values of genesets are associated with several biological effects. The underlying mathematical structure of the genesets allows for analysis of data from several genes at the same time. Focusing on expression levels, the direction of the expression changes, and correlations, we showed that two-step data reduction allowed us to significantly improve the performance of geneset analysis using a modified two-way ANOVA procedure, and to detect genesets that current methods fail to detect

    A Cross-Species Analysis of a Mouse Model of Breast Cancer-Specific Osteolysis and Human Bone Metastases Using Gene Expression Profiling

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Breast cancer is the second leading cause of cancer-related death in women in the United States. During the advanced stages of disease, many breast cancer patients suffer from bone metastasis. These metastases are predominantly osteolytic and develop when tumor cells interact with bone. <it>In vivo </it>models that mimic the breast cancer-specific osteolytic bone microenvironment are limited. Previously, we developed a mouse model of tumor-bone interaction in which three mouse breast cancer cell lines were implanted onto the calvaria. Analysis of tumors from this model revealed that they exhibited strong bone resorption, induction of osteoclasts and intracranial penetration at the tumor bone (TB)-interface.</p> <p>Methods</p> <p>In this study, we identified and used a TB microenvironment-specific gene expression signature from this model to extend our understanding of the metastatic bone microenvironment in human disease and to predict potential therapeutic targets.</p> <p>Results</p> <p>We identified a TB signature consisting of 934 genes that were commonly (among our 3 cell lines) and specifically (as compared to tumor-alone area within the bone microenvironment) up- and down-regulated >2-fold at the TB interface in our mouse osteolytic model. By comparing the TB signature with gene expression profiles from human breast metastases and an <it>in vitro </it>osteoclast model, we demonstrate that our model mimics both the human breast cancer bone microenvironment and osteoclastogenesis. Furthermore, we observed enrichment in various signaling pathways specific to the TB interface; that is, TGF-β and myeloid self-renewal pathways were activated and the Wnt pathway was inactivated. Lastly, we used the TB-signature to predict cyclopenthiazide as a potential inhibitor of the TB interface.</p> <p>Conclusion</p> <p>Our mouse breast cancer model morphologically and genetically resembles the osteoclastic bone microenvironment observed in human disease. Characterization of the gene expression signature specific to the TB interface in our model revealed signaling mechanisms operative in human breast cancer metastases and predicted a therapeutic inhibitor of cancer-mediated osteolysis.</p

    A Systems Biology-Based Classifier for Hepatocellular Carcinoma Diagnosis

    Get PDF
    AIM: The diagnosis of hepatocellular carcinoma (HCC) in the early stage is crucial to the application of curative treatments which are the only hope for increasing the life expectancy of patients. Recently, several large-scale studies have shed light on this problem through analysis of gene expression profiles to identify markers correlated with HCC progression. However, those marker sets shared few genes in common and were poorly validated using independent data. Therefore, we developed a systems biology based classifier by combining the differential gene expression with topological features of human protein interaction networks to enhance the ability of HCC diagnosis. METHODS AND RESULTS: In the Oncomine platform, genes differentially expressed in HCC tissues relative to their corresponding normal tissues were filtered by a corrected Q value cut-off and Concept filters. The identified genes that are common to different microarray datasets were chosen as the candidate markers. Then, their networks were analyzed by GeneGO Meta-Core software and the hub genes were chosen. After that, an HCC diagnostic classifier was constructed by Partial Least Squares modeling based on the microarray gene expression data of the hub genes. Validations of diagnostic performance showed that this classifier had high predictive accuracy (85.88∼92.71%) and area under ROC curve (approximating 1.0), and that the network topological features integrated into this classifier contribute greatly to improving the predictive performance. Furthermore, it has been demonstrated that this modeling strategy is not only applicable to HCC, but also to other cancers. CONCLUSION: Our analysis suggests that the systems biology-based classifier that combines the differential gene expression and topological features of human protein interaction network may enhance the diagnostic performance of HCC classifier

    Ten Years of Pathway Analysis: Current Approaches and Outstanding Challenges

    Get PDF
    Pathway analysis has become the first choice for gaining insight into the underlying biology of differentially expressed genes and proteins, as it reduces complexity and has increased explanatory power. We discuss the evolution of knowledge base–driven pathway analysis over its first decade, distinctly divided into three generations. We also discuss the limitations that are specific to each generation, and how they are addressed by successive generations of methods. We identify a number of annotation challenges that must be addressed to enable development of the next generation of pathway analysis methods. Furthermore, we identify a number of methodological challenges that the next generation of methods must tackle to take advantage of the technological advances in genomics and proteomics in order to improve specificity, sensitivity, and relevance of pathway analysis
    corecore