109 research outputs found
Model misspecification and bias for inverse probability weighting and doubly robust estimators
In the causal inference literature an estimator belonging to a class of
semi-parametric estimators is called robust if it has desirable properties
under the assumption that at least one of the working models is correctly
specified. In this paper we propose a crude analytical approach to study the
large sample bias of semi-parameteric estimators of the average causal effect
when all working models are misspecified. We apply our approach to three
prototypical estimators, two inverse probability weighting (IPW) estimators,
using a misspecified propensity score model, and a doubly robust (DR)
estimator, using misspecified models for the outcome regression and the
propensity score. To analyze the question of when the use of two misspecified
models are better than one we derive necessary and sufficient conditions for
when the DR estimator has a smaller bias than a simple IPW estimator and when
it has a smaller bias than an IPW estimator with normalized weights. If the
misspecificiation of the outcome model is moderate the comparisons of the
biases of the IPW and DR estimators suggest that the DR estimator has a smaller
bias than the IPW estimators. However, all biases include the PS-model error
and we suggest that a researcher is careful when modeling the PS whenever such
a model is involved
Covariate selection for non-parametric estimation of treatment effects
In observational studies, the non-parametric estimation of a binary treatment effect is often performed by matching each treated individual with a control unit which is similar in observed characteristics (covariates). In practical applications, the reservoir of covariates available may be extensive and the question arises which covariates should be matched for. The current practice consists in matching for covariates which are not balanced for the treated and the control groups, i.e. covariates affecting the treatment assignment. This paper develops a theory based on graphical models, whose results emphasize the need for methods looking both at how the covariates affect the treatment assignment and the outcome. Furthermore, we propose identification algorithms to select at minimal set of covariates to match for. An application to the estimation of the effect of a social program is used to illustrate the implementation of such algorithms
Data-driven Algorithms for Dimension Reduction in Causal Inference
In observational studies, the causal effect of a treatment may be confounded
with variables that are related to both the treatment and the outcome of
interest. In order to identify a causal effect, such studies often rely on the
unconfoundedness assumption, i.e., that all confounding variables are observed.
The choice of covariates to control for, which is primarily based on subject
matter knowledge, may result in a large covariate vector in the attempt to
ensure that unconfoundedness holds. However, including redundant covariates can
affect bias and efficiency of nonparametric causal effect estimators, e.g., due
to the curse of dimensionality. Data-driven algorithms for the selection of
sufficient covariate subsets are investigated. Under the assumption of
unconfoundedness the algorithms search for minimal subsets of the covariate
vector. Based, e.g., on the framework of sufficient dimension reduction or
kernel smoothing, the algorithms perform a backward elimination procedure
assessing the significance of each covariate. Their performance is evaluated in
simulations and an application using data from the Swedish Childhood Diabetes
Register is also presented.Comment: 27 pages, 2 figures, 11 table
Effects of correlated covariates on the efficiency of matching and inverse probability weighting estimators for causal inference
In observational studies the overall aim when fitting a model for the propensity score is to reduce bias for an estimator of the causal effect. For this purpose guidelines for covariate selection for propensity score models have been proposed in the causal inference literature. To make the assumption of an unconfounded treatment plausible researchers might be tempted to include many, possibly correlated, covariates in the propensity score model. In this paper we study how the efficiency of matching and inverse probability weighting estimators for average causal effects change when the covariates are correlated. We investigate the case with multivariate normal covariates and linear models for the propensity score and potential outcomes and show results under different model assumptions. We show that the correlation can both increase and decrease the large sample variances of the estimators, and that the corrrelation affects the efficiency of the estimators differently, both with regard to direction and magnitude. Moreover, the strength of the confounding towards the outcome and the treatment plays an important role
Contrasting Identifying Assumptions of Average Causal Effects: Robustness and Semiparametric Efficiency
Semiparametric inference on average causal effects from observational data is
based on assumptions yielding identification of the effects. In practice,
several distinct identifying assumptions may be plausible; an analyst has to
make a delicate choice between these models. In this paper, we study three
identifying assumptions based on the potential outcome framework: the back-door
assumption, which uses pre-treatment covariates, the front-door assumption,
which uses mediators, and the two-door assumption using pre-treatment
covariates and mediators simultaneously. We provide the efficient influence
functions and the corresponding semiparametric efficiency bounds that hold
under these assumptions, and their combinations. We demonstrate that neither of
the identification models provides uniformly the most efficient estimation and
give conditions under which some bounds are lower than others. We show when
semiparametric estimating equation estimators based on influence functions
attain the bounds, and study the robustness of the estimators to
misspecification of the nuisance models. The theory is complemented with
simulation experiments on the finite sample behavior of the estimators. The
results obtained are relevant for an analyst facing a choice between several
plausible identifying assumptions and corresponding estimators. Our results
show that this choice implies a trade-off between efficiency and robustness to
misspecification of the nuisance models
Inverse probability of treatment weighting with generalized linear outcome models for doubly robust estimation
There are now many options for doubly robust estimation; however, there is a
concerning trend in the applied literature to believe that the combination of a
propensity score and an adjusted outcome model automatically results in a
doubly robust estimator and/or to misuse more complex established doubly robust
estimators. A simple alternative, canonical link generalized linear models
(GLM) fit via inverse probability of treatment (propensity score) weighted
maximum likelihood estimation followed by standardization (the g-formula) for
the average causal effect, is a doubly robust estimation method. Our aim is for
the reader not just to be able to use this method, which we refer to as IPTW
GLM, for doubly robust estimation, but to fully understand why it has the
doubly robust property. For this reason, we define clearly, and in multiple
ways, all concepts needed to understand the method and why it is doubly robust.
In addition, we want to make very clear that the mere combination of propensity
score weighting and an adjusted outcome model does not generally result in a
doubly robust estimator. Finally, we hope to dispel the misconception that one
can adjust for residual confounding remaining after propensity score weighting
by adjusting in the outcome model for what remains `unbalanced' even when using
doubly robust estimators. We provide R code for our simulations and real
open-source data examples that can be followed step-by-step to use and
hopefully understand the IPTW GLM method. We also compare to a much
better-known but still simple doubly robust estimator
Propensity score weighting plus an adjusted proportional hazards model does not equal doubly robust away from the null
Recently it has become common for applied works to combine commonly used
survival analysis modeling methods, such as the multivariable Cox model, and
propensity score weighting with the intention of forming a doubly robust
estimator that is unbiased in large samples when either the Cox model or the
propensity score model is correctly specified. This combination does not, in
general, produce a doubly robust estimator, even after regression
standardization, when there is truly a causal effect. We demonstrate via
simulation this lack of double robustness for the semiparametric Cox model, the
Weibull proportional hazards model, and a simple proportional hazards flexible
parametric model, with both the latter models fit via maximum likelihood. We
provide a novel proof that the combination of propensity score weighting and a
proportional hazards survival model, fit either via full or partial likelihood,
is consistent under the null of no causal effect of the exposure on the outcome
under particular censoring mechanisms if either the propensity score or the
outcome model is correctly specified and contains all confounders. Given our
results suggesting that double robustness only exists under the null, we
outline two simple alternative estimators that are doubly robust for the
survival difference at a given time point (in the above sense), provided the
censoring mechanism can be correctly modeled, and one doubly robust method of
estimation for the full survival curve. We provide R code to use these
estimators for estimation and inference in the supplementary materials
Correlation and efficiency of propensity score-based estimators for average causal effects
Propensity score based-estimators are commonly used to estimate causal effects in evaluation research. To reduce bias in observational studies researchers might be tempted to include many, perhaps correlated, covariates when estimating the propensity score model. Taking into account that the propensity score is estimated, this study investigates how the efficiency of matching, inverse probability weighting and doubly robust estimators change under the case of correlated covariates. Propositions regarding the large sample variances under certain assumptions of the data generating process are given. The propositions are supplemented by several numerical large sample and finite sample results from a wide range of models. The results show that the correlation may increase or decrease the variances of the estimators. There are several factors that influence how correlation affects the variance of the estimators, including the choice of estimator, the strength of the confounding towards outcome and treatment, and whether a constant or non-constant causal effect is present
Introduction to statistical simulations in health research
In health research, statistical methods are frequently used to address a wide variety of research questions. For almost every analytical challenge, different methods are available. But how do we choose between different methods and how do we judge whether the chosen method is appropriate for our specific study? Like in any science, in statistics, experiments can be run to find out which methods should be used under which circumstances. The main objective of this paper is to demonstrate that simulation studies, that is, experiments investigating synthetic data with known properties, are an invaluable tool for addressing these questions. We aim to provide a first introduction to simulation studies for data analysts or, more generally, for researchers involved at different levels in the analyses of health data, who (1) may rely on simulation studies published in statistical literature to choose their statistical methods and who, thus, need to understand the criteria of assessing the validity and relevance of simulation results and their interpretation; and/or (2) need to understand the basic principles of designing statistical simulations in order to efficiently collaborate with more experienced colleagues or start learning to conduct their own simulations. We illustrate the implementation of a simulation study and the interpretation of its results through a simple example inspired by recent literature, which is completely reproducible using the R-script available from online supplemental file 1
- …
