64 research outputs found
Sparse and spurious: dictionary learning with noise and outliers
A popular approach within the signal processing and machine learning
communities consists in modelling signals as sparse linear combinations of
atoms selected from a learned dictionary. While this paradigm has led to
numerous empirical successes in various fields ranging from image to audio
processing, there have only been a few theoretical arguments supporting these
evidences. In particular, sparse coding, or sparse dictionary learning, relies
on a non-convex procedure whose local minima have not been fully analyzed yet.
In this paper, we consider a probabilistic model of sparse signals, and show
that, with high probability, sparse coding admits a local minimum around the
reference dictionary generating the signals. Our study takes into account the
case of over-complete dictionaries, noisy signals, and possible outliers, thus
extending previous work limited to noiseless settings and/or under-complete
dictionaries. The analysis we conduct is non-asymptotic and makes it possible
to understand how the key quantities of the problem, such as the coherence or
the level of noise, can scale with respect to the dimension of the signals, the
number of atoms, the sparsity and the number of observations.Comment: This is a substantially revised version of a first draft that
appeared as a preprint titled "Local stability and robustness of sparse
dictionary learning in the presence of noise",
http://hal.inria.fr/hal-00737152, IEEE Transactions on Information Theory,
Institute of Electrical and Electronics Engineers (IEEE), 2015, pp.2
Local stability and robustness of sparse dictionary learning in the presence of noise
A popular approach within the signal processing and machine learning
communities consists in modelling signals as sparse linear combinations of
atoms selected from a learned dictionary. While this paradigm has led to
numerous empirical successes in various fields ranging from image to audio
processing, there have only been a few theoretical arguments supporting these
evidences. In particular, sparse coding, or sparse dictionary learning, relies
on a non-convex procedure whose local minima have not been fully analyzed yet.
In this paper, we consider a probabilistic model of sparse signals, and show
that, with high probability, sparse coding admits a local minimum around the
reference dictionary generating the signals. Our study takes into account the
case of over-complete dictionaries and noisy signals, thus extending previous
work limited to noiseless settings and/or under-complete dictionaries. The
analysis we conduct is non-asymptotic and makes it possible to understand how
the key quantities of the problem, such as the coherence or the level of noise,
can scale with respect to the dimension of the signals, the number of atoms,
the sparsity and the number of observations
Convex Relaxations for Permutation Problems
Seriation seeks to reconstruct a linear order between variables using
unsorted, pairwise similarity information. It has direct applications in
archeology and shotgun gene sequencing for example. We write seriation as an
optimization problem by proving the equivalence between the seriation and
combinatorial 2-SUM problems on similarity matrices (2-SUM is a quadratic
minimization problem over permutations). The seriation problem can be solved
exactly by a spectral algorithm in the noiseless case and we derive several
convex relaxations for 2-SUM to improve the robustness of seriation solutions
in noisy settings. These convex relaxations also allow us to impose structural
constraints on the solution, hence solve semi-supervised seriation problems. We
derive new approximation bounds for some of these relaxations and present
numerical experiments on archeological data, Markov chains and DNA assembly
from shotgun gene sequencing data.Comment: Final journal version, a few typos and references fixe
Network Flow Algorithms for Structured Sparsity
We consider a class of learning problems that involve a structured
sparsity-inducing norm defined as the sum of -norms over groups of
variables. Whereas a lot of effort has been put in developing fast optimization
methods when the groups are disjoint or embedded in a specific hierarchical
structure, we address here the case of general overlapping groups. To this end,
we show that the corresponding optimization problem is related to network flow
optimization. More precisely, the proximal problem associated with the norm we
consider is dual to a quadratic min-cost flow problem. We propose an efficient
procedure which computes its solution exactly in polynomial time. Our algorithm
scales up to millions of variables, and opens up a whole new range of
applications for structured sparse models. We present several experiments on
image and video data, demonstrating the applicability and scalability of our
approach for various problems.Comment: accepted for publication in Adv. Neural Information Processing
Systems, 201
Convex and Network Flow Optimization for Structured Sparsity
We consider a class of learning problems regularized by a structured
sparsity-inducing norm defined as the sum of l_2- or l_infinity-norms over
groups of variables. Whereas much effort has been put in developing fast
optimization techniques when the groups are disjoint or embedded in a
hierarchy, we address here the case of general overlapping groups. To this end,
we present two different strategies: On the one hand, we show that the proximal
operator associated with a sum of l_infinity-norms can be computed exactly in
polynomial time by solving a quadratic min-cost flow problem, allowing the use
of accelerated proximal gradient methods. On the other hand, we use proximal
splitting techniques, and address an equivalent formulation with
non-overlapping groups, but in higher dimension and with additional
constraints. We propose efficient and scalable algorithms exploiting these two
strategies, which are significantly faster than alternative approaches. We
illustrate these methods with several problems such as CUR matrix
factorization, multi-task learning of tree-structured dictionaries, background
subtraction in video sequences, image denoising with wavelets, and topographic
dictionary learning of natural image patches.Comment: to appear in the Journal of Machine Learning Research (JMLR
Optimization with Sparsity-Inducing Penalties
Sparse estimation methods are aimed at using or obtaining parsimonious
representations of data or models. They were first dedicated to linear variable
selection but numerous extensions have now emerged such as structured sparsity
or kernel selection. It turns out that many of the related estimation problems
can be cast as convex optimization problems by regularizing the empirical risk
with appropriate non-smooth norms. The goal of this paper is to present from a
general perspective optimization tools and techniques dedicated to such
sparsity-inducing penalties. We cover proximal methods, block-coordinate
descent, reweighted -penalized techniques, working-set and homotopy
methods, as well as non-convex formulations and extensions, and provide an
extensive set of experiments to compare various algorithms from a computational
point of view
Sample Complexity of Dictionary Learning and other Matrix Factorizations
Many modern tools in machine learning and signal processing, such as sparse
dictionary learning, principal component analysis (PCA), non-negative matrix
factorization (NMF), -means clustering, etc., rely on the factorization of a
matrix obtained by concatenating high-dimensional vectors from a training
collection. While the idealized task would be to optimize the expected quality
of the factors over the underlying distribution of training vectors, it is
achieved in practice by minimizing an empirical average over the considered
collection. The focus of this paper is to provide sample complexity estimates
to uniformly control how much the empirical average deviates from the expected
cost function. Standard arguments imply that the performance of the empirical
predictor also exhibit such guarantees. The level of genericity of the approach
encompasses several possible constraints on the factors (tensor product
structure, shift-invariance, sparsity \ldots), thus providing a unified
perspective on the sample complexity of several widely used matrix
factorization schemes. The derived generalization bounds behave proportional to
w.r.t.\ the number of samples for the considered matrix
factorization techniques.Comment: to appea
- …
