1,806 research outputs found
Familywise Error Rate Control via Knockoffs
We present a novel method for controlling the -familywise error rate
(-FWER) in the linear regression setting using the knockoffs framework first
introduced by Barber and Cand\`es. Our procedure, which we also refer to as
knockoffs, can be applied with any design matrix with at least as many
observations as variables, and does not require knowing the noise variance.
Unlike other multiple testing procedures which act directly on -values,
knockoffs is specifically tailored to linear regression and implicitly accounts
for the statistical relationships between hypothesis tests of different
coefficients. We prove that knockoffs controls the -FWER exactly in finite
samples and show in simulations that it provides superior power to alternative
procedures over a range of linear regression problems. We also discuss
extensions to controlling other Type I error rates such as the false exceedance
rate, and use it to identify candidates for mutations conferring
drug-resistance in HIV.Comment: 15 pages, 3 figures. Updated reference
Robust Inference Under Heteroskedasticity via the Hadamard Estimator
Drawing statistical inferences from large datasets in a model-robust way is
an important problem in statistics and data science. In this paper, we propose
methods that are robust to large and unequal noise in different observational
units (i.e., heteroskedasticity) for statistical inference in linear
regression. We leverage the Hadamard estimator, which is unbiased for the
variances of ordinary least-squares regression. This is in contrast to the
popular White's sandwich estimator, which can be substantially biased in high
dimensions. We propose to estimate the signal strength, noise level,
signal-to-noise ratio, and mean squared error via the Hadamard estimator. We
develop a new degrees of freedom adjustment that gives more accurate confidence
intervals than variants of White's sandwich estimator. Moreover, we provide
conditions ensuring the estimator is well-defined, by studying a new random
matrix ensemble in which the entries of a random orthogonal projection matrix
are squared. We also show approximate normality, using the second-order
Poincare inequality. Our work provides improved statistical theory and methods
for linear regression in high dimensions
- …
