62,804 research outputs found
A Proximal-Gradient Homotopy Method for the Sparse Least-Squares Problem
We consider solving the -regularized least-squares (-LS)
problem in the context of sparse recovery, for applications such as compressed
sensing. The standard proximal gradient method, also known as iterative
soft-thresholding when applied to this problem, has low computational cost per
iteration but a rather slow convergence rate. Nevertheless, when the solution
is sparse, it often exhibits fast linear convergence in the final stage. We
exploit the local linear convergence using a homotopy continuation strategy,
i.e., we solve the -LS problem for a sequence of decreasing values of
the regularization parameter, and use an approximate solution at the end of
each stage to warm start the next stage. Although similar strategies have been
studied in the literature, there have been no theoretical analysis of their
global iteration complexity. This paper shows that under suitable assumptions
for sparse recovery, the proposed homotopy strategy ensures that all iterates
along the homotopy solution path are sparse. Therefore the objective function
is effectively strongly convex along the solution path, and geometric
convergence at each stage can be established. As a result, the overall
iteration complexity of our method is for finding an
-optimal solution, which can be interpreted as global geometric rate
of convergence. We also present empirical results to support our theoretical
analysis
A Proximal Stochastic Gradient Method with Progressive Variance Reduction
We consider the problem of minimizing the sum of two convex functions: one is
the average of a large number of smooth component functions, and the other is a
general convex function that admits a simple proximal mapping. We assume the
whole objective function is strongly convex. Such problems often arise in
machine learning, known as regularized empirical risk minimization. We propose
and analyze a new proximal stochastic gradient method, which uses a multi-stage
scheme to progressively reduce the variance of the stochastic gradient. While
each iteration of this algorithm has similar cost as the classical stochastic
gradient method (or incremental gradient method), we show that the expected
objective value converges to the optimum at a geometric rate. The overall
complexity of this method is much lower than both the proximal full gradient
method and the standard proximal stochastic gradient method
Stochastic Primal-Dual Coordinate Method for Regularized Empirical Risk Minimization
We consider a generic convex optimization problem associated with regularized
empirical risk minimization of linear predictors. The problem structure allows
us to reformulate it as a convex-concave saddle point problem. We propose a
stochastic primal-dual coordinate (SPDC) method, which alternates between
maximizing over a randomly chosen dual variable and minimizing over the primal
variable. An extrapolation step on the primal variable is performed to obtain
accelerated convergence rate. We also develop a mini-batch version of the SPDC
method which facilitates parallel computing, and an extension with weighted
sampling probabilities on the dual variables, which has a better complexity
than uniform sampling on unnormalized data. Both theoretically and empirically,
we show that the SPDC method has comparable or better performance than several
state-of-the-art optimization methods
- …
