109 research outputs found

    Model misspecification and bias for inverse probability weighting and doubly robust estimators

    Full text link
    In the causal inference literature an estimator belonging to a class of semi-parametric estimators is called robust if it has desirable properties under the assumption that at least one of the working models is correctly specified. In this paper we propose a crude analytical approach to study the large sample bias of semi-parameteric estimators of the average causal effect when all working models are misspecified. We apply our approach to three prototypical estimators, two inverse probability weighting (IPW) estimators, using a misspecified propensity score model, and a doubly robust (DR) estimator, using misspecified models for the outcome regression and the propensity score. To analyze the question of when the use of two misspecified models are better than one we derive necessary and sufficient conditions for when the DR estimator has a smaller bias than a simple IPW estimator and when it has a smaller bias than an IPW estimator with normalized weights. If the misspecificiation of the outcome model is moderate the comparisons of the biases of the IPW and DR estimators suggest that the DR estimator has a smaller bias than the IPW estimators. However, all biases include the PS-model error and we suggest that a researcher is careful when modeling the PS whenever such a model is involved

    Covariate selection for non-parametric estimation of treatment effects

    Full text link
    In observational studies, the non-parametric estimation of a binary treatment effect is often performed by matching each treated individual with a control unit which is similar in observed characteristics (covariates). In practical applications, the reservoir of covariates available may be extensive and the question arises which covariates should be matched for. The current practice consists in matching for covariates which are not balanced for the treated and the control groups, i.e. covariates affecting the treatment assignment. This paper develops a theory based on graphical models, whose results emphasize the need for methods looking both at how the covariates affect the treatment assignment and the outcome. Furthermore, we propose identification algorithms to select at minimal set of covariates to match for. An application to the estimation of the effect of a social program is used to illustrate the implementation of such algorithms

    Data-driven Algorithms for Dimension Reduction in Causal Inference

    Full text link
    In observational studies, the causal effect of a treatment may be confounded with variables that are related to both the treatment and the outcome of interest. In order to identify a causal effect, such studies often rely on the unconfoundedness assumption, i.e., that all confounding variables are observed. The choice of covariates to control for, which is primarily based on subject matter knowledge, may result in a large covariate vector in the attempt to ensure that unconfoundedness holds. However, including redundant covariates can affect bias and efficiency of nonparametric causal effect estimators, e.g., due to the curse of dimensionality. Data-driven algorithms for the selection of sufficient covariate subsets are investigated. Under the assumption of unconfoundedness the algorithms search for minimal subsets of the covariate vector. Based, e.g., on the framework of sufficient dimension reduction or kernel smoothing, the algorithms perform a backward elimination procedure assessing the significance of each covariate. Their performance is evaluated in simulations and an application using data from the Swedish Childhood Diabetes Register is also presented.Comment: 27 pages, 2 figures, 11 table

    Effects of correlated covariates on the efficiency of matching and inverse probability weighting estimators for causal inference

    Full text link
    In observational studies the overall aim when fitting a model for the propensity score is to reduce bias for an estimator of the causal effect. For this purpose guidelines for covariate selection for propensity score models have been proposed in the causal inference literature. To make the assumption of an unconfounded treatment plausible researchers might be tempted to include many, possibly correlated, covariates in the propensity score model. In this paper we study how the efficiency of matching and inverse probability weighting estimators for average causal effects change when the covariates are correlated. We investigate the case with multivariate normal covariates and linear models for the propensity score and potential outcomes and show results under different model assumptions. We show that the correlation can both increase and decrease the large sample variances of the estimators, and that the corrrelation affects the efficiency of the estimators differently, both with regard to direction and magnitude. Moreover, the strength of the confounding towards the outcome and the treatment plays an important role

    Contrasting Identifying Assumptions of Average Causal Effects: Robustness and Semiparametric Efficiency

    Full text link
    Semiparametric inference on average causal effects from observational data is based on assumptions yielding identification of the effects. In practice, several distinct identifying assumptions may be plausible; an analyst has to make a delicate choice between these models. In this paper, we study three identifying assumptions based on the potential outcome framework: the back-door assumption, which uses pre-treatment covariates, the front-door assumption, which uses mediators, and the two-door assumption using pre-treatment covariates and mediators simultaneously. We provide the efficient influence functions and the corresponding semiparametric efficiency bounds that hold under these assumptions, and their combinations. We demonstrate that neither of the identification models provides uniformly the most efficient estimation and give conditions under which some bounds are lower than others. We show when semiparametric estimating equation estimators based on influence functions attain the bounds, and study the robustness of the estimators to misspecification of the nuisance models. The theory is complemented with simulation experiments on the finite sample behavior of the estimators. The results obtained are relevant for an analyst facing a choice between several plausible identifying assumptions and corresponding estimators. Our results show that this choice implies a trade-off between efficiency and robustness to misspecification of the nuisance models

    Inverse probability of treatment weighting with generalized linear outcome models for doubly robust estimation

    Full text link
    There are now many options for doubly robust estimation; however, there is a concerning trend in the applied literature to believe that the combination of a propensity score and an adjusted outcome model automatically results in a doubly robust estimator and/or to misuse more complex established doubly robust estimators. A simple alternative, canonical link generalized linear models (GLM) fit via inverse probability of treatment (propensity score) weighted maximum likelihood estimation followed by standardization (the g-formula) for the average causal effect, is a doubly robust estimation method. Our aim is for the reader not just to be able to use this method, which we refer to as IPTW GLM, for doubly robust estimation, but to fully understand why it has the doubly robust property. For this reason, we define clearly, and in multiple ways, all concepts needed to understand the method and why it is doubly robust. In addition, we want to make very clear that the mere combination of propensity score weighting and an adjusted outcome model does not generally result in a doubly robust estimator. Finally, we hope to dispel the misconception that one can adjust for residual confounding remaining after propensity score weighting by adjusting in the outcome model for what remains `unbalanced' even when using doubly robust estimators. We provide R code for our simulations and real open-source data examples that can be followed step-by-step to use and hopefully understand the IPTW GLM method. We also compare to a much better-known but still simple doubly robust estimator

    Propensity score weighting plus an adjusted proportional hazards model does not equal doubly robust away from the null

    Full text link
    Recently it has become common for applied works to combine commonly used survival analysis modeling methods, such as the multivariable Cox model, and propensity score weighting with the intention of forming a doubly robust estimator that is unbiased in large samples when either the Cox model or the propensity score model is correctly specified. This combination does not, in general, produce a doubly robust estimator, even after regression standardization, when there is truly a causal effect. We demonstrate via simulation this lack of double robustness for the semiparametric Cox model, the Weibull proportional hazards model, and a simple proportional hazards flexible parametric model, with both the latter models fit via maximum likelihood. We provide a novel proof that the combination of propensity score weighting and a proportional hazards survival model, fit either via full or partial likelihood, is consistent under the null of no causal effect of the exposure on the outcome under particular censoring mechanisms if either the propensity score or the outcome model is correctly specified and contains all confounders. Given our results suggesting that double robustness only exists under the null, we outline two simple alternative estimators that are doubly robust for the survival difference at a given time point (in the above sense), provided the censoring mechanism can be correctly modeled, and one doubly robust method of estimation for the full survival curve. We provide R code to use these estimators for estimation and inference in the supplementary materials

    Correlation and efficiency of propensity score-based estimators for average causal effects

    Full text link
    Propensity score based-estimators are commonly used to estimate causal effects in evaluation research. To reduce bias in observational studies researchers might be tempted to include many, perhaps correlated, covariates when estimating the propensity score model. Taking into account that the propensity score is estimated, this study investigates how the efficiency of matching, inverse probability weighting and doubly robust estimators change under the case of correlated covariates. Propositions regarding the large sample variances under certain assumptions of the data generating process are given. The propositions are supplemented by several numerical large sample and finite sample results from a wide range of models. The results show that the correlation may increase or decrease the variances of the estimators. There are several factors that influence how correlation affects the variance of the estimators, including the choice of estimator, the strength of the confounding towards outcome and treatment, and whether a constant or non-constant causal effect is present

    Introduction to statistical simulations in health research

    Get PDF
    In health research, statistical methods are frequently used to address a wide variety of research questions. For almost every analytical challenge, different methods are available. But how do we choose between different methods and how do we judge whether the chosen method is appropriate for our specific study? Like in any science, in statistics, experiments can be run to find out which methods should be used under which circumstances. The main objective of this paper is to demonstrate that simulation studies, that is, experiments investigating synthetic data with known properties, are an invaluable tool for addressing these questions. We aim to provide a first introduction to simulation studies for data analysts or, more generally, for researchers involved at different levels in the analyses of health data, who (1) may rely on simulation studies published in statistical literature to choose their statistical methods and who, thus, need to understand the criteria of assessing the validity and relevance of simulation results and their interpretation; and/or (2) need to understand the basic principles of designing statistical simulations in order to efficiently collaborate with more experienced colleagues or start learning to conduct their own simulations. We illustrate the implementation of a simulation study and the interpretation of its results through a simple example inspired by recent literature, which is completely reproducible using the R-script available from online supplemental file 1
    corecore