286 research outputs found

    Large-sample study of the kernel density estimators under multiplicative censoring

    Full text link
    The multiplicative censoring model introduced in Vardi [Biometrika 76 (1989) 751--761] is an incomplete data problem whereby two independent samples from the lifetime distribution GG, Xm=(X1,...,Xm)\mathcal{X}_m=(X_1,...,X_m) and Zn=(Z1,...,Zn)\mathcal{Z}_n=(Z_1,...,Z_n), are observed subject to a form of coarsening. Specifically, sample Xm\mathcal{X}_m is fully observed while Yn=(Y1,...,Yn)\mathcal{Y}_n=(Y_1,...,Y_n) is observed instead of Zn\mathcal{Z}_n, where Yi=UiZiY_i=U_iZ_i and (U1,...,Un)(U_1,...,U_n) is an independent sample from the standard uniform distribution. Vardi [Biometrika 76 (1989) 751--761] showed that this model unifies several important statistical problems, such as the deconvolution of an exponential random variable, estimation under a decreasing density constraint and an estimation problem in renewal processes. In this paper, we establish the large-sample properties of kernel density estimators under the multiplicative censoring model. We first construct a strong approximation for the process k(G^G)\sqrt{k}(\hat{G}-G), where G^\hat{G} is a solution of the nonparametric score equation based on (Xm,Yn)(\mathcal{X}_m,\mathcal{Y}_n), and k=m+nk=m+n is the total sample size. Using this strong approximation and a result on the global modulus of continuity, we establish conditions for the strong uniform consistency of kernel density estimators. We also make use of this strong approximation to study the weak convergence and integrated squared error properties of these estimators. We conclude by extending our results to the setting of length-biased sampling.Comment: Published in at http://dx.doi.org/10.1214/11-AOS954 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Second-Order Inference for the Mean of a Variable Missing at Random

    Get PDF
    We present a second-order estimator of the mean of a variable subject to missingness, under the missing at random assumption. The estimator improves upon existing methods by using an approximate second-order expansion of the parameter functional, in addition to the first-order expansion employed by standard doubly robust methods. This results in weaker assumptions about the convergence rates necessary to establish consistency, local efficiency, and asymptotic linearity. The general estimation strategy is developed under the targeted minimum loss-based estimation (TMLE) framework. We present a simulation comparing the sensitivity of the first and second order estimators to the convergence rate of the initial estimators of the outcome regression and missingness score. In our simulation, the second-order TMLE improved the coverage probability of a confidence interval by up to 85%. In addition, we present a first-order estimator inspired by a second-order expansion of the parameter functional. This estimator only requires one-dimensional smoothing, whereas implementation of the second-order TMLE generally requires kernel smoothing on the covariate space. The first-order estimator proposed is expected to have improved finite sample performance compared to existing first-order estimators. In our simulations, the proposed first-order estimator improved the coverage probability by up to 90%. We provide an illustration of our methods using a publicly available dataset to determine the effect of an anticoagulant on health outcomes of patients undergoing percutaneous coronary intervention. We provide R code implementing the proposed estimator

    Nonparametric Incidence Estimation From Prevalent Cohort Survival Data

    Get PDF
    Incidence is an important epidemiologic concept particularly useful in assessing an intervention, quantifying disease risk, and planning health resources. Incident cohort studies constitute the gold-standard in estimating disease incidence. However, due to material constraints, data are often collected from prevalent cohort studies whereby diseased individuals are recruited through a cross-sectional survey and followed forward in time. We discuss the identifiability of measures of incidence in the context of prevalent cohort survival studies and derive nonparametric maximum likelihood estimators and their asymptotic properties. The proposed methodology accounts for calendar-time and age-at-onset variation in disease incidence while also addressing common complications arising from the sampling scheme, hence providing flexible and robust estimates. We also discuss age-specific incidence and adjustments for temporal variations in survival. We apply our methodology to data from the Canadian Study of Health and Aging and provide insight into temporal trends in the incidence of dementia in the Canadian elderly population

    Computerizing Efficient Estimation of a Pathwise Differentiable Target Parameter

    Get PDF
    Frangakis et al. (2015) proposed a numerical method for computing the efficient influence function of a parameter in a nonparametric model at a specified distribution and observation (provided such an influence function exists). Their approach is based on the assumption that the efficient influence function is given by the directional derivative of the target parameter mapping in the direction of a perturbation of the data distribution defined as the convex line from the data distribution to a pointmass at the observation. In our discussion paper Luedtke et al. (2015) we propose a regularization of this procedure and establish the validity of this method in great generality. In this article we propose a generalization of the latter regularized numerical delta method for computing the efficient influence function for general statistical models, and formally establish its validity under appropriate regularity conditions. Our proposed method consists of applying the regularized numerical delta-method for nonparametrically-defined target parameters proposed in Luedtke et al. 2015 to the nonparametrically-defined maximum likelihood mapping that maps a data distribution (normally the empirical distribution) into its Kullback-Leibler projection onto the model. This method formalizes the notion that an algorithm for computing a maximum likelihood estimator also yields an algorithm for computing the efficient influence function at a user-supplied data distribution. We generalize this method to a minimum loss-based mapping. We also show how the method extends to compute the higher-order efficient influence function at an observation pair for higher-order pathwise differentiable target parameters. Finally, we propose a new method for computing the efficient influence function as a whole curve by applying the maximum likelihood mapping to a perturbation of the data distribution with score equal to an initial gradient of the pathwise derivative. We demonstrate each method with a variety of examples

    On-Demand Virtual Research Environments using Microservices

    Full text link
    The computational demands for scientific applications are continuously increasing. The emergence of cloud computing has enabled on-demand resource allocation. However, relying solely on infrastructure as a service does not achieve the degree of flexibility required by the scientific community. Here we present a microservice-oriented methodology, where scientific applications run in a distributed orchestration platform as software containers, referred to as on-demand, virtual research environments. The methodology is vendor agnostic and we provide an open source implementation that supports the major cloud providers, offering scalable management of scientific pipelines. We demonstrate applicability and scalability of our methodology in life science applications, but the methodology is general and can be applied to other scientific domains

    An Omnibus Nonparametric Test of Equality in Distribution for Unknown Functions

    Get PDF
    We present a novel family of nonparametric omnibus tests of the hypothesis that two unknown but estimable functions are equal in distribution when applied to the observed data structure. We developed these tests, which represent a generalization of the maximum mean discrepancy tests described in Gretton et al. [2006], using recent developments from the higher-order pathwise differentiability literature. Despite their complex derivation, the associated test statistics can be expressed rather simply as U-statistics. We study the asymptotic behavior of the proposed tests under the null hypothesis and under both fixed and local alternatives. We provide examples to which our tests can be applied and show that they perform well in a simulation study. As an important special case, our proposed tests can be used to determine whether an unknown function, such as the conditional average treatment effect, is equal to zero almost surely
    corecore