25,033 research outputs found

    Estimation of high-dimensional low-rank matrices

    Full text link
    Suppose that we observe entries or, more generally, linear combinations of entries of an unknown m×Tm\times T-matrix AA corrupted by noise. We are particularly interested in the high-dimensional setting where the number mTmT of unknown entries can be much larger than the sample size NN. Motivated by several applications, we consider estimation of matrix AA under the assumption that it has small rank. This can be viewed as dimension reduction or sparsity assumption. In order to shrink toward a low-rank representation, we investigate penalized least squares estimators with a Schatten-pp quasi-norm penalty term, p1p\leq1. We study these estimators under two possible assumptions---a modified version of the restricted isometry condition and a uniform bound on the ratio "empirical norm induced by the sampling operator/Frobenius norm." The main results are stated as nonasymptotic upper bounds on the prediction risk and on the Schatten-qq risk of the estimators, where q[p,2]q\in[p,2]. The rates that we obtain for the prediction risk are of the form rm/Nrm/N (for m=Tm=T), up to logarithmic factors, where rr is the rank of AA. The particular examples of multi-task learning and matrix completion are worked out in detail. The proofs are based on tools from the theory of empirical processes. As a by-product, we derive bounds for the kkth entropy numbers of the quasi-convex Schatten class embeddings SpMS2MS_p^M\hookrightarrow S_2^M, p<1p<1, which are of independent interest.Comment: Published in at http://dx.doi.org/10.1214/10-AOS860 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Nonparametric estimation of composite functions

    Get PDF
    We study the problem of nonparametric estimation of a multivariate function g:RdRg:\mathbb {R}^d\to\mathbb{R} that can be represented as a composition of two unknown smooth functions f:RRf:\mathbb{R}\to\mathbb{R} and G:RdRG:\mathbb{R}^d\to \mathbb{R}. We suppose that ff and GG belong to known smoothness classes of functions, with smoothness γ\gamma and β\beta, respectively. We obtain the full description of minimax rates of estimation of gg in terms of γ\gamma and β\beta, and propose rate-optimal estimators for the sup-norm loss. For the construction of such estimators, we first prove an approximation result for composite functions that may have an independent interest, and then a result on adaptation to the local structure. Interestingly, the construction of rate-optimal estimators for composite functions (with given, fixed smoothness) needs adaptation, but not in the traditional sense: it is now adaptation to the local structure. We prove that composition models generate only two types of local structures: the local single-index model and the local model with roughness isolated to a single dimension (i.e., a model containing elements of both additive and single-index structure). We also find the zones of (γ\gamma, β\beta) where no local structure is generated, as well as the zones where the composition modeling leads to faster rates, as compared to the classical nonparametric rates that depend only to the overall smoothness of gg.Comment: Published in at http://dx.doi.org/10.1214/08-AOS611 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Fast learning rates for plug-in classifiers under the margin condition

    Get PDF
    It has been recently shown that, under the margin (or low noise) assumption, there exist classifiers attaining fast rates of convergence of the excess Bayes risk, i.e., the rates faster than n1/2n^{-1/2}. The works on this subject suggested the following two conjectures: (i) the best achievable fast rate is of the order n1n^{-1}, and (ii) the plug-in classifiers generally converge slower than the classifiers based on empirical risk minimization. We show that both conjectures are not correct. In particular, we construct plug-in classifiers that can achieve not only the fast, but also the {\it super-fast} rates, i.e., the rates faster than n1n^{-1}. We establish minimax lower bounds showing that the obtained rates cannot be improved.Comment: 36 page
    corecore