185 research outputs found
Recommended from our members
The Level of Residual Dispersion Variation and the Power of Differential Expression Tests for RNA-Seq Data
RNA-Sequencing (RNA-Seq) has been widely adopted for quantifying gene expression changes in comparative transcriptome analysis. For detecting differentially expressed genes, a variety of statistical methods based on the negative binomial (NB) distribution have been proposed. These methods differ in the ways they handle the NB nuisance parameters (i.e., the dispersion parameters associated with each gene) to save power, such as by using a dispersion model to exploit an apparent relationship between the dispersion parameter and the NB mean. Presumably, dispersion models with fewer parameters will result in greater power if the models are correct, but will produce misleading conclusions if not. This paper investigates this power and robustness trade-off by assessing rates of identifying true differential expression using the various methods under realistic assumptions about NB dispersion parameters. Our results indicate that the relative performances of the different methods are closely related to the level of dispersion variation unexplained by the dispersion model. We propose a simple statistic to quantify the level of residual dispersion variation from a fitted dispersion model and show that the magnitude of this statistic gives hints about whether and how much we can gain statistical power by a dispersion-modeling approach
Recommended from our members
Goodness-of-Fit Tests and Model Diagnostics for Negative Binomial Regression of RNA Sequencing Data
This work is about assessing model adequacy for negative binomial (NB) regression, particularly (1) assessing the adequacy of the NB assumption, and (2) assessing the appropriateness of models for NB dispersion parameters. Tools for the first are appropriate for NB regression generally; those for the second are primarily intended for RNA sequencing (RNA-Seq) data analysis. The typically small number of biological samples and large number of genes in RNA-Seq analysis motivate us to address the trade-offs between robustness and statistical power using NB regression models. One widely-used power-saving strategy, for example, is to assume some commonalities of NB dispersion parameters across genes via simple models relating them to mean expression rates, and many such models have been proposed. As RNA-Seq analysis is becoming ever more popular, it is appropriate to make more thorough investigations into power and robustness of the resulting methods, and into practical tools for model assessment. In this article, we propose simulation-based statistical tests and diagnostic graphics to address model adequacy. We provide simulated and real data examples to illustrate that our proposed methods are effective for detecting the misspecification of the NB mean-variance relationship as well as judging the adequacy of fit of several NB dispersion models
Recommended from our members
Differential Expression of Genes Involved in Host Recognition, Attachment, and Degradation in the Mycoparasite Tolypocladium ophioglossoides
This is the publisher’s final pdf. The published article is copyrighted by the author(s) and published by the Genetics Society of America. The published article can be found at: https://doi.org/10.1534/g3.116.027045The ability of a fungus to infect novel hosts is dependent on changes in gene content, expression, or regulation. Examining gene expression under simulated host conditions can explore which genes may contribute to host jumping. Insect pathogenesis is the inferred ancestral character state for species of Tolypocladium, however several species are parasites of truffles, including Tolypocladium ophioglossoides. To identify potentially crucial genes in this interkingdom host switch, T. ophioglossoides was grown on four media conditions: media containing the inner and outer portions of its natural host (truffles of Elaphomyces), cuticles from an ancestral host (beetle), and a rich medium (Yeast Malt). Through high-throughput RNASeq of mRNA from these conditions, many differentially expressed genes were identified in the experiment. These included PTH11-related G-protein-coupled receptors (GPCRs) hypothesized to be involved in host recognition, and also found to be upregulated in insect pathogens. A divergent chitinase with a signal peptide was also found to be highly upregulated on media containing truffle tissue, suggesting an exogenous degradative activity in the presence of the truffle host. The adhesin gene, Mad1, was highly expressed on truffle media as well. A BiNGO analysis of overrepresented GO terms from genes expressed during each growth condition found that genes involved in redox reactions and transmembrane transport were the most overrepresented during T. ophioglossoides growth on truffle media, suggesting their importance in growth on fungal tissue as compared to other hosts and environments. Genes involved in secondary metabolism were most highly expressed during growth on insect tissue, suggesting that their products may not be necessary during parasitism of Elaphomyces. This study provides clues into understanding genetic mechanisms underlying the transition from insect to truffle parasitism
Construction of an evaluation system for the effectiveness of rural sewage treatment facilities and empirical research
IntroductionRural domestic sewage treatment is an important starting point to improve the quality of the rural ecological environment, an important part of new rural construction, and an inherent requirement to promote rural economic development. The operation of rural sewage treatment facilities is not good, and there is a lack of long-term operation guarantees and supervision mechanisms. It is urgent to carry out research on the evaluation index system, evaluation method, and evaluation benchmark of the operational effectiveness of rural sewage treatment facilities.MethodsThis article used rural sewage treatment facilities in a city in northern China as the research object and constructed an evaluation method for the operational effectiveness of rural sewage treatment facilities. This study selected evaluation indexes from three perspectives, namely, economy, technology, and management, which are divided into two stages, namely, planning and operation. A judgment matrix was constructed using the analytic hierarchy process (AHP), and index weights were calculated using Yaahp10.3 software to determine the evaluation criteria. Fifteen rural sewage treatment plant stations were selected to evaluate their planning and operation effectiveness.ResultsThe results of the weight assignment show that the weight of the COD removal rate, operating load rate, and operating cost indexes are high, which is in line with the actual evaluation of the effectiveness of rural sewage treatment facilities at different stages. The empirical calculation results showed that the rural sewage treatment facilities have a comprehensive score of more than 80 points in 7 cases and 60–80 points in 8 cases, with an average score of 79.05 points; the overall performance of the score in the operation stage was better than that in the planning stage, and the overall operation effect was good.DiscussionThe calculation results were consistent with the actual operation, verifying the scientific nature and availability of the selected indices, the evaluation method constructed, and the evaluation benchmark determined. The research results can provide technical methods for evaluating the operational effectiveness of rural sewage treatment facilities in similar areas and provide technical support for the planning, design, optimization, upgrading, and transformation of rural sewage treatment plants
Recommended from our members
Length Bias Correction in Gene Ontology Enrichment Analysis Using Logistic Regression
When assessing differential gene expression from RNA sequencing data, commonly used statistical tests tend to have greater power to detect differential expression of genes encoding longer transcripts. This phenomenon, called "length bias", will influence subsequent analyses such as Gene Ontology enrichment analysis. In the presence of length bias, Gene Ontology categories that include longer genes are more likely to be identified as enriched. These categories, however, are not necessarily biologically more relevant. We show that one can effectively adjust for length bias in Gene Ontology analysis by including transcript length as a covariate in a logistic regression model. The logistic regression model makes the statistical issue underlying length bias more transparent: transcript length becomes a confounding factor when it correlates with both the Gene Ontology membership and the significance of the differential expression test. The inclusion of the transcript length as a covariate allows one to investigate the direct correlation between the Gene Ontology membership and the significance of testing differential expression, conditional on the transcript length. We present both real and simulated data examples to show that the logistic regression approach is simple, effective, and flexible
Recommended from our members
Higher order asymptotics for negative binomial regression inferences from RNA-sequencing data
RNA sequencing (RNA-Seq) is the current method of choice for characterizing transcriptomes and
quantifying gene expression changes. This next generation sequencing-based method provides unprecedented
depth and resolution. The negative binomial (NB) probability distribution has been shown to be a
useful model for frequencies of mapped RNA-Seq reads and consequently provides a basis for statistical analysis
of gene expression. Negative binomial exact tests are available for two-group comparisons but do not
extend to negative binomial regression analysis, which is important for examining gene expression as a function
of explanatory variables and for adjusted group comparisons accounting for other factors. We address
the adequacy of available large-sample tests for the small sample sizes typically available from RNA-Seq
studies and consider a higher-order asymptotic (HOA) adjustment to likelihood ratio tests. We demonstrate
that 1) the HOA-adjusted likelihood ratio test is practically indistinguishable from the exact test in situations
where the exact test is available, 2) the type I error of the HOA test matches the nominal specification in
regression settings we examined via simulation, and 3) the power of the likelihood ratio test does not appear
to be affected by the HOA adjustment. This work helps clarify the accuracy of the unadjusted likelihood ratio
test and the degree of improvement available with the HOA adjustment. Furthermore, the HOA test may be
preferable even when the exact test is available because it does not require ad hoc library size adjustments.Keywords: Regression, RNA-Seq, Overdispersion, Extra- Poisson variation, Negative binomial, Higher-order asymptotic
- …
