Search CORE

1,806 research outputs found

Familywise Error Rate Control via Knockoffs

Author: Janson Lucas
Su Weijie
Publication venue
Publication date: 09/11/2015
Field of study

We present a novel method for controlling the

k

-familywise error rate (

k

-FWER) in the linear regression setting using the knockoffs framework first introduced by Barber and Cand\`es. Our procedure, which we also refer to as knockoffs, can be applied with any design matrix with at least as many observations as variables, and does not require knowing the noise variance. Unlike other multiple testing procedures which act directly on

p

-values, knockoffs is specifically tailored to linear regression and implicitly accounts for the statistical relationships between hypothesis tests of different coefficients. We prove that knockoffs controls the

k

-FWER exactly in finite samples and show in simulations that it provides superior power to alternative procedures over a range of linear regression problems. We also discuss extensions to controlling other Type I error rates such as the false exceedance rate, and use it to identify candidates for mutations conferring drug-resistance in HIV.Comment: 15 pages, 3 figures. Updated reference

arXiv.org e-Print Archive

ScholarlyCommons@Penn

Robust Inference Under Heteroskedasticity via the Hadamard Estimator

Author: Dobriban Edgar
Su Weijie J.
Publication venue
Publication date: 01/07/2018
Field of study

Drawing statistical inferences from large datasets in a model-robust way is an important problem in statistics and data science. In this paper, we propose methods that are robust to large and unequal noise in different observational units (i.e., heteroskedasticity) for statistical inference in linear regression. We leverage the Hadamard estimator, which is unbiased for the variances of ordinary least-squares regression. This is in contrast to the popular White's sandwich estimator, which can be substantially biased in high dimensions. We propose to estimate the signal strength, noise level, signal-to-noise ratio, and mean squared error via the Hadamard estimator. We develop a new degrees of freedom adjustment that gives more accurate confidence intervals than variants of White's sandwich estimator. Moreover, we provide conditions ensuring the estimator is well-defined, by studying a new random matrix ensemble in which the entries of a random orthogonal projection matrix are squared. We also show approximate normality, using the second-order Poincare inequality. Our work provides improved statistical theory and methods for linear regression in high dimensions

arXiv.org e-Print Archive