919 research outputs found
Adapting to Unknown Low-Dimensional Structures in Score-Based Diffusion Models
This paper investigates score-based diffusion models when the underlying
target distribution is concentrated on or near low-dimensional manifolds within
the higher-dimensional space in which they formally reside, a common
characteristic of natural image distributions. Despite previous efforts to
understand the data generation process of diffusion models, existing
theoretical support remains highly suboptimal in the presence of
low-dimensional structure, which we strengthen in this paper. For the popular
Denoising Diffusion Probabilistic Model (DDPM), we find that the dependency of
the error incurred within each denoising step on the ambient dimension is
in general unavoidable. We further identify a unique design of coefficients
that yields a converges rate at the order of (up to log
factors), where is the intrinsic dimension of the target distribution and
is the number of steps. This represents the first theoretical demonstration
that the DDPM sampler can adapt to unknown low-dimensional structures in the
target distribution, highlighting the critical importance of coefficient
design. All of this is achieved by a novel set of analysis tools that
characterize the algorithmic dynamics in a more deterministic manner
Entrywise Inference for Missing Panel Data: A Simple and Instance-Optimal Approach
Longitudinal or panel data can be represented as a matrix with rows indexed
by units and columns indexed by time. We consider inferential questions
associated with the missing data version of panel data induced by staggered
adoption. We propose a computationally efficient procedure for estimation,
involving only simple matrix algebra and singular value decomposition, and
prove non-asymptotic and high-probability bounds on its error in estimating
each missing entry. By controlling proximity to a suitably scaled Gaussian
variable, we develop and analyze a data-driven procedure for constructing
entrywise confidence intervals with pre-specified coverage. Despite its
simplicity, our procedure turns out to be instance-optimal: we prove that the
width of our confidence intervals match a non-asymptotic instance-wise lower
bound derived via a Bayesian Cram\'{e}r-Rao argument. We illustrate the
sharpness of our theoretical characterization on a variety of numerical
examples. Our analysis is based on a general inferential toolbox for SVD-based
algorithm applied to the matrix denoising model, which might be of independent
interest
Sample-Efficient Reinforcement Learning for Linearly-Parameterized MDPs with a Generative Model
The curse of dimensionality is a widely known issue in reinforcement learning
(RL). In the tabular setting where the state space and the action
space are both finite, to obtain a nearly optimal policy with
sampling access to a generative model, the minimax optimal sample complexity
scales linearly with , which can be
prohibitively large when or is large. This paper
considers a Markov decision process (MDP) that admits a set of state-action
features, which can linearly express (or approximate) its probability
transition kernel. We show that a model-based approach (resp.Q-learning)
provably learns an -optimal policy (resp.Q-function) with high
probability as soon as the sample size exceeds the order of
(resp.), up to some logarithmic
factor. Here is the feature dimension and is the discount
factor of the MDP. Both sample complexity bounds are provably tight, and our
result for the model-based approach matches the minimax lower bound. Our
results show that for arbitrarily large-scale MDP, both the model-based
approach and Q-learning are sample-efficient when is relatively small, and
hence the title of this paper
The Isotonic Mechanism for Exponential Family Estimation
In 2023, the International Conference on Machine Learning (ICML) required
authors with multiple submissions to rank their submissions based on perceived
quality. In this paper, we aim to employ these author-specified rankings to
enhance peer review in machine learning and artificial intelligence conferences
by extending the Isotonic Mechanism (Su, 2021, 2022) to exponential family
distributions. This mechanism generates adjusted scores closely align with the
original scores while adhering to author-specified rankings. Despite its
applicability to a broad spectrum of exponential family distributions, this
mechanism's implementation does not necessitate knowledge of the specific
distribution form. We demonstrate that an author is incentivized to provide
accurate rankings when her utility takes the form of a convex additive function
of the adjusted review scores. For a certain subclass of exponential family
distributions, we prove that the author reports truthfully only if the question
involves only pairwise comparisons between her submissions, thus indicating the
optimality of ranking in truthful information elicitation. Lastly, we show that
the adjusted scores improve dramatically the accuracy of the original scores
and achieve nearly minimax optimality for estimating the true scores with
statistical consistecy when true scores have bounded total variation
Minimax-Optimal Reward-Agnostic Exploration in Reinforcement Learning
This paper studies reward-agnostic exploration in reinforcement learning (RL)
-- a scenario where the learner is unware of the reward functions during the
exploration stage -- and designs an algorithm that improves over the state of
the art. More precisely, consider a finite-horizon non-stationary Markov
decision process with states, actions, and horizon length , and
suppose that there are no more than a polynomial number of given reward
functions of interest. By collecting an order of \begin{align*}
\frac{SAH^3}{\varepsilon^2} \text{ sample episodes (up to log factor)}
\end{align*} without guidance of the reward information, our algorithm is able
to find -optimal policies for all these reward functions, provided
that is sufficiently small. This forms the first reward-agnostic
exploration scheme in this context that achieves provable minimax optimality.
Furthermore, once the sample size exceeds
episodes (up to log factor), our algorithm is able to yield
accuracy for arbitrarily many reward functions (even when they are
adversarially designed), a task commonly dubbed as ``reward-free exploration.''
The novelty of our algorithm design draws on insights from offline RL: the
exploration scheme attempts to maximize a critical reward-agnostic quantity
that dictates the performance of offline RL, while the policy learning paradigm
leverages ideas from sample-optimal offline RL paradigms
Model-Based Reinforcement Learning for Offline Zero-Sum Markov Games
This paper makes progress towards learning Nash equilibria in two-player
zero-sum Markov games from offline data. Specifically, consider a
-discounted infinite-horizon Markov game with states, where the
max-player has actions and the min-player has actions. We propose a
pessimistic model-based algorithm with Bernstein-style lower confidence bounds
-- called VI-LCB-Game -- that provably finds an -approximate Nash
equilibrium with a sample complexity no larger than
(up
to some log factor). Here, is some unilateral
clipped concentrability coefficient that reflects the coverage and distribution
shift of the available data (vis-\`a-vis the target data), and the target
accuracy can be any value within
. Our sample complexity bound strengthens prior
art by a factor of , achieving minimax optimality for the entire
-range. An appealing feature of our result lies in algorithmic
simplicity, which reveals the unnecessity of variance reduction and sample
splitting in achieving sample optimality.Comment: accepted to Operations Researc
Inference and Uncertainty Quantification for Noisy Matrix Completion
Noisy matrix completion aims at estimating a low-rank matrix given only
partial and corrupted entries. Despite substantial progress in designing
efficient estimation algorithms, it remains largely unclear how to assess the
uncertainty of the obtained estimates and how to perform statistical inference
on the unknown matrix (e.g.~constructing a valid and short confidence interval
for an unseen entry).
This paper takes a step towards inference and uncertainty quantification for
noisy matrix completion. We develop a simple procedure to compensate for the
bias of the widely used convex and nonconvex estimators. The resulting
de-biased estimators admit nearly precise non-asymptotic distributional
characterizations, which in turn enable optimal construction of confidence
intervals\,/\,regions for, say, the missing entries and the low-rank factors.
Our inferential procedures do not rely on sample splitting, thus avoiding
unnecessary loss of data efficiency. As a byproduct, we obtain a sharp
characterization of the estimation accuracy of our de-biased estimators, which,
to the best of our knowledge, are the first tractable algorithms that provably
achieve full statistical efficiency (including the preconstant). The analysis
herein is built upon the intimate link between convex and nonconvex
optimization --- an appealing feature recently discovered by
\cite{chen2019noisy}.Comment: published at Proceedings of the National Academy of Sciences Nov
2019, 116 (46) 22931-2293
- …
