1,669 research outputs found
On the Optimization of Deep Networks: Implicit Acceleration by Overparameterization
Conventional wisdom in deep learning states that increasing depth improves
expressiveness but complicates optimization. This paper suggests that,
sometimes, increasing depth can speed up optimization. The effect of depth on
optimization is decoupled from expressiveness by focusing on settings where
additional layers amount to overparameterization - linear neural networks, a
well-studied model. Theoretical analysis, as well as experiments, show that
here depth acts as a preconditioner which may accelerate convergence. Even on
simple convex problems such as linear regression with loss, ,
gradient descent can benefit from transitioning to a non-convex
overparameterized objective, more than it would from some common acceleration
schemes. We also prove that it is mathematically impossible to obtain the
acceleration effect of overparametrization via gradients of any regularizer.Comment: Published at the International Conference on Machine Learning (ICML)
201
Learning mixtures of separated nonspherical Gaussians
Mixtures of Gaussian (or normal) distributions arise in a variety of
application areas. Many heuristics have been proposed for the task of finding
the component Gaussians given samples from the mixture, such as the EM
algorithm, a local-search heuristic from Dempster, Laird and Rubin [J. Roy.
Statist. Soc. Ser. B 39 (1977) 1-38]. These do not provably run in polynomial
time. We present the first algorithm that provably learns the component
Gaussians in time that is polynomial in the dimension. The Gaussians may have
arbitrary shape, but they must satisfy a ``separation condition'' which places
a lower bound on the distance between the centers of any two component
Gaussians. The mathematical results at the heart of our proof are ``distance
concentration'' results--proved using isoperimetric inequalities--which
establish bounds on the probability distribution of the distance between a pair
of points generated according to the mixture. We also formalize the more
general problem of max-likelihood fit of a Gaussian mixture to unstructured
data.Comment: Published at http://dx.doi.org/10.1214/105051604000000512 in the
Annals of Applied Probability (http://www.imstat.org/aap/) by the Institute
of Mathematical Statistics (http://www.imstat.org
Provable ICA with Unknown Gaussian Noise, and Implications for Gaussian Mixtures and Autoencoders
We present a new algorithm for Independent Component Analysis (ICA) which has
provable performance guarantees. In particular, suppose we are given samples of
the form where is an unknown matrix and is
a random variable whose components are independent and have a fourth moment
strictly less than that of a standard Gaussian random variable and is an
-dimensional Gaussian random variable with unknown covariance : We
give an algorithm that provable recovers and up to an additive
and whose running time and sample complexity are polynomial in
and . To accomplish this, we introduce a novel "quasi-whitening"
step that may be useful in other contexts in which the covariance of Gaussian
noise is not known in advance. We also give a general framework for finding all
local optima of a function (given an oracle for approximately finding just one)
and this is a crucial step in our algorithm, one that has been overlooked in
previous attempts, and allows us to control the accumulation of error when we
find the columns of one by one via local search
- …
