286 research outputs found
Large-sample study of the kernel density estimators under multiplicative censoring
The multiplicative censoring model introduced in Vardi [Biometrika 76 (1989)
751--761] is an incomplete data problem whereby two independent samples from
the lifetime distribution , and
, are observed subject to a form of coarsening.
Specifically, sample is fully observed while
is observed instead of , where
and is an independent sample from the standard
uniform distribution. Vardi [Biometrika 76 (1989) 751--761] showed that this
model unifies several important statistical problems, such as the deconvolution
of an exponential random variable, estimation under a decreasing density
constraint and an estimation problem in renewal processes. In this paper, we
establish the large-sample properties of kernel density estimators under the
multiplicative censoring model. We first construct a strong approximation for
the process , where is a solution of the
nonparametric score equation based on , and
is the total sample size. Using this strong approximation and a result
on the global modulus of continuity, we establish conditions for the strong
uniform consistency of kernel density estimators. We also make use of this
strong approximation to study the weak convergence and integrated squared error
properties of these estimators. We conclude by extending our results to the
setting of length-biased sampling.Comment: Published in at http://dx.doi.org/10.1214/11-AOS954 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Second-Order Inference for the Mean of a Variable Missing at Random
We present a second-order estimator of the mean of a variable subject to
missingness, under the missing at random assumption. The estimator improves
upon existing methods by using an approximate second-order expansion of the
parameter functional, in addition to the first-order expansion employed by
standard doubly robust methods. This results in weaker assumptions about the
convergence rates necessary to establish consistency, local efficiency, and
asymptotic linearity. The general estimation strategy is developed under the
targeted minimum loss-based estimation (TMLE) framework. We present a
simulation comparing the sensitivity of the first and second order estimators
to the convergence rate of the initial estimators of the outcome regression and
missingness score. In our simulation, the second-order TMLE improved the
coverage probability of a confidence interval by up to 85%. In addition, we
present a first-order estimator inspired by a second-order expansion of the
parameter functional. This estimator only requires one-dimensional smoothing,
whereas implementation of the second-order TMLE generally requires kernel
smoothing on the covariate space. The first-order estimator proposed is
expected to have improved finite sample performance compared to existing
first-order estimators. In our simulations, the proposed first-order estimator
improved the coverage probability by up to 90%. We provide an illustration of
our methods using a publicly available dataset to determine the effect of an
anticoagulant on health outcomes of patients undergoing percutaneous coronary
intervention. We provide R code implementing the proposed estimator
Nonparametric Incidence Estimation From Prevalent Cohort Survival Data
Incidence is an important epidemiologic concept particularly useful in assessing an intervention, quantifying disease risk, and planning health resources. Incident cohort studies constitute the gold-standard in estimating disease incidence. However, due to material constraints, data are often collected from prevalent cohort studies whereby diseased individuals are recruited through a cross-sectional survey and followed forward in time. We discuss the identifiability of measures of incidence in the context of prevalent cohort survival studies and derive nonparametric maximum likelihood estimators and their asymptotic properties. The proposed methodology accounts for calendar-time and age-at-onset variation in disease incidence while also addressing common complications arising from the sampling scheme, hence providing flexible and robust estimates. We also discuss age-specific incidence and adjustments for temporal variations in survival. We apply our methodology to data from the Canadian Study of Health and Aging and provide insight into temporal trends in the incidence of dementia in the Canadian elderly population
Computerizing Efficient Estimation of a Pathwise Differentiable Target Parameter
Frangakis et al. (2015) proposed a numerical method for computing the efficient influence function of a parameter in a nonparametric model at a specified distribution and observation (provided such an influence function exists). Their approach is based on the assumption that the efficient influence function is given by the directional derivative of the target parameter mapping in the direction of a perturbation of the data distribution defined as the convex line from the data distribution to a pointmass at the observation. In our discussion paper Luedtke et al. (2015) we propose a regularization of this procedure and establish the validity of this method in great generality. In this article we propose a generalization of the latter regularized numerical delta method for computing the efficient influence function for general statistical models, and formally establish its validity under appropriate regularity conditions. Our proposed method consists of applying the regularized numerical delta-method for nonparametrically-defined target parameters proposed in Luedtke et al. 2015 to the nonparametrically-defined maximum likelihood mapping that maps a data distribution (normally the empirical distribution) into its Kullback-Leibler projection onto the model. This method formalizes the notion that an algorithm for computing a maximum likelihood estimator also yields an algorithm for computing the efficient influence function at a user-supplied data distribution. We generalize this method to a minimum loss-based mapping. We also show how the method extends to compute the higher-order efficient influence function at an observation pair for higher-order pathwise differentiable target parameters. Finally, we propose a new method for computing the efficient influence function as a whole curve by applying the maximum likelihood mapping to a perturbation of the data distribution with score equal to an initial gradient of the pathwise derivative. We demonstrate each method with a variety of examples
On-Demand Virtual Research Environments using Microservices
The computational demands for scientific applications are continuously
increasing. The emergence of cloud computing has enabled on-demand resource
allocation. However, relying solely on infrastructure as a service does not
achieve the degree of flexibility required by the scientific community. Here we
present a microservice-oriented methodology, where scientific applications run
in a distributed orchestration platform as software containers, referred to as
on-demand, virtual research environments. The methodology is vendor agnostic
and we provide an open source implementation that supports the major cloud
providers, offering scalable management of scientific pipelines. We demonstrate
applicability and scalability of our methodology in life science applications,
but the methodology is general and can be applied to other scientific domains
An Omnibus Nonparametric Test of Equality in Distribution for Unknown Functions
We present a novel family of nonparametric omnibus tests of the hypothesis that two unknown but estimable functions are equal in distribution when applied to the observed data structure. We developed these tests, which represent a generalization of the maximum mean discrepancy tests described in Gretton et al. [2006], using recent developments from the higher-order pathwise differentiability literature. Despite their complex derivation, the associated test statistics can be expressed rather simply as U-statistics. We study the asymptotic behavior of the proposed tests under the null hypothesis and under both fixed and local alternatives. We provide examples to which our tests can be applied and show that they perform well in a simulation study. As an important special case, our proposed tests can be used to determine whether an unknown function, such as the conditional average treatment effect, is equal to zero almost surely
- …
