1 research outputs found

    EEBoost: a general method for prediction and variable selection based on estimating equations

    Get PDF
    Abstract The modern statistical literature is replete with methods for performing variable selection and prediction in standard regression problems. However, simple models may misspecify or fail to capture important aspects of the data generating process such as missingness, correlation, and over/underdispersion. This realization has motivated the development of a large class of estimating equations which account for these data characteristics and often yield improved inference for lowdimensional parameters. In this paper we introduce EEBoost, a novel strategy for variable selection and prediction which can be applied in any problem where inference would typically be based on estimating 1 equations. The method is simple, flexible, and easily implemented using existing software. Extended abstract The modern statistical literature is replete with methods for performing variable selection and prediction in standard regression problems. However, simple models may misspecify or fail to capture important aspects of the data generating process such as missingness, correlation, and over/underdispersion. This realization has motivated the development of a large class of estimating equations which account for these data characteristics and often yield improved inference for low-dimensional parameters. In this paper we introduce EEBoost, a novel strategy for variable selection and prediction which can be applied in any problem where inference would typically be based on estimating equations. The method is simple, flexible, and easily implemented using existing software. The EEBoost algorithm is obtained as a straightforward modification of the standard boosting (or functional gradient descent) technique. We show that EEBoost is closely related to a class of L 1 constrained projected likelihood ratio minimizations, and therefore produces similar variable selection paths to penalized methods without the need to apply constrained optimization algorithms. The flexibility of EEBoost is illustrated by applying it to simulated examples with correlated outcomes (based on generalized estimating equations) and time-to-event data with missing covariates (based on inverse probability weighted estimating equations). In both cases, EEBoost outperforms standard variable selection methods which do not account for the relevant data characteristics
    corecore