Search CORE

1,669 research outputs found

On the Optimization of Deep Networks: Implicit Acceleration by Overparameterization

Author: Arora Sanjeev
Cohen Nadav
Hazan Elad
Publication venue
Publication date: 01/01/2018
Field of study

Conventional wisdom in deep learning states that increasing depth improves expressiveness but complicates optimization. This paper suggests that, sometimes, increasing depth can speed up optimization. The effect of depth on optimization is decoupled from expressiveness by focusing on settings where additional layers amount to overparameterization - linear neural networks, a well-studied model. Theoretical analysis, as well as experiments, show that here depth acts as a preconditioner which may accelerate convergence. Even on simple convex problems such as linear regression with

\ell_p

loss,

p>2

, gradient descent can benefit from transitioning to a non-convex overparameterized objective, more than it would from some common acceleration schemes. We also prove that it is mathematically impossible to obtain the acceleration effect of overparametrization via gradients of any regularizer.Comment: Published at the International Conference on Machine Learning (ICML) 201

arXiv.org e-Print Archive

Princeton University Open Access Repository

Learning mixtures of separated nonspherical Gaussians

Author: Arora Sanjeev
Kannan Ravi
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2005
Field of study

Mixtures of Gaussian (or normal) distributions arise in a variety of application areas. Many heuristics have been proposed for the task of finding the component Gaussians given samples from the mixture, such as the EM algorithm, a local-search heuristic from Dempster, Laird and Rubin [J. Roy. Statist. Soc. Ser. B 39 (1977) 1-38]. These do not provably run in polynomial time. We present the first algorithm that provably learns the component Gaussians in time that is polynomial in the dimension. The Gaussians may have arbitrary shape, but they must satisfy a ``separation condition'' which places a lower bound on the distance between the centers of any two component Gaussians. The mathematical results at the heart of our proof are ``distance concentration'' results--proved using isoperimetric inequalities--which establish bounds on the probability distribution of the distance between a pair of points generated according to the mixture. We also formalize the more general problem of max-likelihood fit of a Gaussian mixture to unstructured data.Comment: Published at http://dx.doi.org/10.1214/105051604000000512 in the Annals of Applied Probability (http://www.imstat.org/aap/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

CiteSeerX

Crossref

Provable ICA with Unknown Gaussian Noise, and Implications for Gaussian Mixtures and Autoencoders

Author: Arora Sanjeev
Ge Rong
Moitra Ankur
Sachdeva Sushant
Publication venue
Publication date: 01/01/2012
Field of study

We present a new algorithm for Independent Component Analysis (ICA) which has provable performance guarantees. In particular, suppose we are given samples of the form

y = Ax + \eta

where

A

is an unknown

n \times n

matrix and

x

is a random variable whose components are independent and have a fourth moment strictly less than that of a standard Gaussian random variable and

\eta

is an

n

-dimensional Gaussian random variable with unknown covariance

\Sigma

: We give an algorithm that provable recovers

A

and

\Sigma

up to an additive

\epsilon

and whose running time and sample complexity are polynomial in

n

and

1 / \epsilon

. To accomplish this, we introduce a novel "quasi-whitening" step that may be useful in other contexts in which the covariance of Gaussian noise is not known in advance. We also give a general framework for finding all local optima of a function (given an oracle for approximately finding just one) and this is a crucial step in our algorithm, one that has been overlooked in previous attempts, and allows us to control the accumulation of error when we find the columns of

A

one by one via local search

arXiv.org e-Print Archive

CiteSeerX