2,291 research outputs found
Kernel Conjugate Gradient Methods with Random Projections
We propose and study kernel conjugate gradient methods (KCGM) with random
projections for least-squares regression over a separable Hilbert space.
Considering two types of random projections generated by randomized sketches
and Nystr\"{o}m subsampling, we prove optimal statistical results with respect
to variants of norms for the algorithms under a suitable stopping rule.
Particularly, our results show that if the projection dimension is proportional
to the effective dimension of the problem, KCGM with randomized sketches can
generalize optimally, while achieving a computational advantage. As a
corollary, we derive optimal rates for classic KCGM in the case that the target
function may not be in the hypothesis space, filling a theoretical gap.Comment: 43 pages, 2 figure
Near-Optimal Noisy Group Testing via Separate Decoding of Items
The group testing problem consists of determining a small set of defective
items from a larger set of items based on a number of tests, and is relevant in
applications such as medical testing, communication protocols, pattern
matching, and more. In this paper, we revisit an efficient algorithm for noisy
group testing in which each item is decoded separately (Malyutov and Mateev,
1980), and develop novel performance guarantees via an information-theoretic
framework for general noise models. For the special cases of no noise and
symmetric noise, we find that the asymptotic number of tests required for
vanishing error probability is within a factor of the
information-theoretic optimum at low sparsity levels, and that with a small
fraction of allowed incorrectly decoded items, this guarantee extends to all
sublinear sparsity levels. In addition, we provide a converse bound showing
that if one tries to move slightly beyond our low-sparsity achievability
threshold using separate decoding of items and i.i.d. randomized testing, the
average number of items decoded incorrectly approaches that of a trivial
decoder.Comment: Submitted to IEEE Journal of Selected Topics in Signal Processin
On the linear convergence of the stochastic gradient method with constant step-size
The strong growth condition (SGC) is known to be a sufficient condition for
linear convergence of the stochastic gradient method using a constant step-size
(SGM-CS). In this paper, we provide a necessary condition, for the
linear convergence of SGM-CS, that is weaker than SGC. Moreover, when this
necessary is violated up to a additive perturbation , we show that both
the projected stochastic gradient method using a constant step-size (PSGM-CS)
and the proximal stochastic gradient method exhibit linear convergence to a
noise dominated region, whose distance to the optimal solution is proportional
to
Randomized Low-Memory Singular Value Projection
Affine rank minimization algorithms typically rely on calculating the
gradient of a data error followed by a singular value decomposition at every
iteration. Because these two steps are expensive, heuristic approximations are
often used to reduce computational burden. To this end, we propose a recovery
scheme that merges the two steps with randomized approximations, and as a
result, operates on space proportional to the degrees of freedom in the
problem. We theoretically establish the estimation guarantees of the algorithm
as a function of approximation tolerance. While the theoretical approximation
requirements are overly pessimistic, we demonstrate that in practice the
algorithm performs well on the quantum tomography recovery problem.Comment: 13 pages. This version has a revised theorem and new numerical
experiment
Faster Coordinate Descent via Adaptive Importance Sampling
Coordinate descent methods employ random partial updates of decision
variables in order to solve huge-scale convex optimization problems. In this
work, we introduce new adaptive rules for the random selection of their
updates. By adaptive, we mean that our selection rules are based on the dual
residual or the primal-dual gap estimates and can change at each iteration. We
theoretically characterize the performance of our selection rules and
demonstrate improvements over the state-of-the-art, and extend our theory and
algorithms to general convex objectives. Numerical evidence with hinge-loss
support vector machines and Lasso confirm that the practice follows the theory.Comment: appearing at AISTATS 201
Phase Transitions in the Pooled Data Problem
In this paper, we study the pooled data problem of identifying the labels
associated with a large collection of items, based on a sequence of pooled
tests revealing the counts of each label within the pool. In the noiseless
setting, we identify an exact asymptotic threshold on the required number of
tests with optimal decoding, and prove a phase transition between complete
success and complete failure. In addition, we present a novel noisy variation
of the problem, and provide an information-theoretic framework for
characterizing the required number of tests for general random noise models.
Our results reveal that noise can make the problem considerably more difficult,
with strict increases in the scaling laws even at low noise levels. Finally, we
demonstrate similar behavior in an approximate recovery setting, where a given
number of errors is allowed in the decoded labels.Comment: Accepted to NIPS 201
Time-Varying Gaussian Process Bandit Optimization
We consider the sequential Bayesian optimization problem with bandit
feedback, adopting a formulation that allows for the reward function to vary
with time. We model the reward function using a Gaussian process whose
evolution obeys a simple Markov model. We introduce two natural extensions of
the classical Gaussian process upper confidence bound (GP-UCB) algorithm. The
first, R-GP-UCB, resets GP-UCB at regular intervals. The second, TV-GP-UCB,
instead forgets about old data in a smooth fashion. Our main contribution
comprises of novel regret bounds for these algorithms, providing an explicit
characterization of the trade-off between the time horizon and the rate at
which the function varies. We illustrate the performance of the algorithms on
both synthetic and real data, and we find the gradual forgetting of TV-GP-UCB
to perform favorably compared to the sharp resetting of R-GP-UCB. Moreover,
both algorithms significantly outperform classical GP-UCB, since it treats
stale and fresh data equally.Comment: To appear in AISTATS 201
- …
