919 research outputs found

    Adapting to Unknown Low-Dimensional Structures in Score-Based Diffusion Models

    Full text link
    This paper investigates score-based diffusion models when the underlying target distribution is concentrated on or near low-dimensional manifolds within the higher-dimensional space in which they formally reside, a common characteristic of natural image distributions. Despite previous efforts to understand the data generation process of diffusion models, existing theoretical support remains highly suboptimal in the presence of low-dimensional structure, which we strengthen in this paper. For the popular Denoising Diffusion Probabilistic Model (DDPM), we find that the dependency of the error incurred within each denoising step on the ambient dimension dd is in general unavoidable. We further identify a unique design of coefficients that yields a converges rate at the order of O(k2/T)O(k^{2}/\sqrt{T}) (up to log factors), where kk is the intrinsic dimension of the target distribution and TT is the number of steps. This represents the first theoretical demonstration that the DDPM sampler can adapt to unknown low-dimensional structures in the target distribution, highlighting the critical importance of coefficient design. All of this is achieved by a novel set of analysis tools that characterize the algorithmic dynamics in a more deterministic manner

    Entrywise Inference for Missing Panel Data: A Simple and Instance-Optimal Approach

    Full text link
    Longitudinal or panel data can be represented as a matrix with rows indexed by units and columns indexed by time. We consider inferential questions associated with the missing data version of panel data induced by staggered adoption. We propose a computationally efficient procedure for estimation, involving only simple matrix algebra and singular value decomposition, and prove non-asymptotic and high-probability bounds on its error in estimating each missing entry. By controlling proximity to a suitably scaled Gaussian variable, we develop and analyze a data-driven procedure for constructing entrywise confidence intervals with pre-specified coverage. Despite its simplicity, our procedure turns out to be instance-optimal: we prove that the width of our confidence intervals match a non-asymptotic instance-wise lower bound derived via a Bayesian Cram\'{e}r-Rao argument. We illustrate the sharpness of our theoretical characterization on a variety of numerical examples. Our analysis is based on a general inferential toolbox for SVD-based algorithm applied to the matrix denoising model, which might be of independent interest

    Sample-Efficient Reinforcement Learning for Linearly-Parameterized MDPs with a Generative Model

    Full text link
    The curse of dimensionality is a widely known issue in reinforcement learning (RL). In the tabular setting where the state space S\mathcal{S} and the action space A\mathcal{A} are both finite, to obtain a nearly optimal policy with sampling access to a generative model, the minimax optimal sample complexity scales linearly with S×A|\mathcal{S}|\times|\mathcal{A}|, which can be prohibitively large when S\mathcal{S} or A\mathcal{A} is large. This paper considers a Markov decision process (MDP) that admits a set of state-action features, which can linearly express (or approximate) its probability transition kernel. We show that a model-based approach (resp. ~Q-learning) provably learns an ε\varepsilon-optimal policy (resp. ~Q-function) with high probability as soon as the sample size exceeds the order of K(1γ)3ε2\frac{K}{(1-\gamma)^{3}\varepsilon^{2}} (resp. ~K(1γ)4ε2\frac{K}{(1-\gamma)^{4}\varepsilon^{2}}), up to some logarithmic factor. Here KK is the feature dimension and γ(0,1)\gamma\in(0,1) is the discount factor of the MDP. Both sample complexity bounds are provably tight, and our result for the model-based approach matches the minimax lower bound. Our results show that for arbitrarily large-scale MDP, both the model-based approach and Q-learning are sample-efficient when KK is relatively small, and hence the title of this paper

    The Isotonic Mechanism for Exponential Family Estimation

    Full text link
    In 2023, the International Conference on Machine Learning (ICML) required authors with multiple submissions to rank their submissions based on perceived quality. In this paper, we aim to employ these author-specified rankings to enhance peer review in machine learning and artificial intelligence conferences by extending the Isotonic Mechanism (Su, 2021, 2022) to exponential family distributions. This mechanism generates adjusted scores closely align with the original scores while adhering to author-specified rankings. Despite its applicability to a broad spectrum of exponential family distributions, this mechanism's implementation does not necessitate knowledge of the specific distribution form. We demonstrate that an author is incentivized to provide accurate rankings when her utility takes the form of a convex additive function of the adjusted review scores. For a certain subclass of exponential family distributions, we prove that the author reports truthfully only if the question involves only pairwise comparisons between her submissions, thus indicating the optimality of ranking in truthful information elicitation. Lastly, we show that the adjusted scores improve dramatically the accuracy of the original scores and achieve nearly minimax optimality for estimating the true scores with statistical consistecy when true scores have bounded total variation

    Minimax-Optimal Reward-Agnostic Exploration in Reinforcement Learning

    Full text link
    This paper studies reward-agnostic exploration in reinforcement learning (RL) -- a scenario where the learner is unware of the reward functions during the exploration stage -- and designs an algorithm that improves over the state of the art. More precisely, consider a finite-horizon non-stationary Markov decision process with SS states, AA actions, and horizon length HH, and suppose that there are no more than a polynomial number of given reward functions of interest. By collecting an order of \begin{align*} \frac{SAH^3}{\varepsilon^2} \text{ sample episodes (up to log factor)} \end{align*} without guidance of the reward information, our algorithm is able to find ε\varepsilon-optimal policies for all these reward functions, provided that ε\varepsilon is sufficiently small. This forms the first reward-agnostic exploration scheme in this context that achieves provable minimax optimality. Furthermore, once the sample size exceeds S2AH3ε2\frac{S^2AH^3}{\varepsilon^2} episodes (up to log factor), our algorithm is able to yield ε\varepsilon accuracy for arbitrarily many reward functions (even when they are adversarially designed), a task commonly dubbed as ``reward-free exploration.'' The novelty of our algorithm design draws on insights from offline RL: the exploration scheme attempts to maximize a critical reward-agnostic quantity that dictates the performance of offline RL, while the policy learning paradigm leverages ideas from sample-optimal offline RL paradigms

    Model-Based Reinforcement Learning for Offline Zero-Sum Markov Games

    Full text link
    This paper makes progress towards learning Nash equilibria in two-player zero-sum Markov games from offline data. Specifically, consider a γ\gamma-discounted infinite-horizon Markov game with SS states, where the max-player has AA actions and the min-player has BB actions. We propose a pessimistic model-based algorithm with Bernstein-style lower confidence bounds -- called VI-LCB-Game -- that provably finds an ε\varepsilon-approximate Nash equilibrium with a sample complexity no larger than CclippedS(A+B)(1γ)3ε2\frac{C_{\mathsf{clipped}}^{\star}S(A+B)}{(1-\gamma)^{3}\varepsilon^{2}} (up to some log factor). Here, CclippedC_{\mathsf{clipped}}^{\star} is some unilateral clipped concentrability coefficient that reflects the coverage and distribution shift of the available data (vis-\`a-vis the target data), and the target accuracy ε\varepsilon can be any value within (0,11γ]\big(0,\frac{1}{1-\gamma}\big]. Our sample complexity bound strengthens prior art by a factor of min{A,B}\min\{A,B\}, achieving minimax optimality for the entire ε\varepsilon-range. An appealing feature of our result lies in algorithmic simplicity, which reveals the unnecessity of variance reduction and sample splitting in achieving sample optimality.Comment: accepted to Operations Researc

    Inference and Uncertainty Quantification for Noisy Matrix Completion

    Full text link
    Noisy matrix completion aims at estimating a low-rank matrix given only partial and corrupted entries. Despite substantial progress in designing efficient estimation algorithms, it remains largely unclear how to assess the uncertainty of the obtained estimates and how to perform statistical inference on the unknown matrix (e.g.~constructing a valid and short confidence interval for an unseen entry). This paper takes a step towards inference and uncertainty quantification for noisy matrix completion. We develop a simple procedure to compensate for the bias of the widely used convex and nonconvex estimators. The resulting de-biased estimators admit nearly precise non-asymptotic distributional characterizations, which in turn enable optimal construction of confidence intervals\,/\,regions for, say, the missing entries and the low-rank factors. Our inferential procedures do not rely on sample splitting, thus avoiding unnecessary loss of data efficiency. As a byproduct, we obtain a sharp characterization of the estimation accuracy of our de-biased estimators, which, to the best of our knowledge, are the first tractable algorithms that provably achieve full statistical efficiency (including the preconstant). The analysis herein is built upon the intimate link between convex and nonconvex optimization --- an appealing feature recently discovered by \cite{chen2019noisy}.Comment: published at Proceedings of the National Academy of Sciences Nov 2019, 116 (46) 22931-2293
    corecore