1,211 research outputs found
Sampling Correctors
In many situations, sample data is obtained from a noisy or imperfect source.
In order to address such corruptions, this paper introduces the concept of a
sampling corrector. Such algorithms use structure that the distribution is
purported to have, in order to allow one to make "on-the-fly" corrections to
samples drawn from probability distributions. These algorithms then act as
filters between the noisy data and the end user.
We show connections between sampling correctors, distribution learning
algorithms, and distribution property testing algorithms. We show that these
connections can be utilized to expand the applicability of known distribution
learning and property testing algorithms as well as to achieve improved
algorithms for those tasks.
As a first step, we show how to design sampling correctors using proper
learning algorithms. We then focus on the question of whether algorithms for
sampling correctors can be more efficient in terms of sample complexity than
learning algorithms for the analogous families of distributions. When
correcting monotonicity, we show that this is indeed the case when also granted
query access to the cumulative distribution function. We also obtain sampling
correctors for monotonicity without this stronger type of access, provided that
the distribution be originally very close to monotone (namely, at a distance
). In addition to that, we consider a restricted error model
that aims at capturing "missing data" corruptions. In this model, we show that
distributions that are close to monotone have sampling correctors that are
significantly more efficient than achievable by the learning approach.
We also consider the question of whether an additional source of independent
random bits is required by sampling correctors to implement the correction
process
Testing probability distributions using conditional samples
We study a new framework for property testing of probability distributions,
by considering distribution testing algorithms that have access to a
conditional sampling oracle.* This is an oracle that takes as input a subset of the domain of the unknown probability distribution
and returns a draw from the conditional probability distribution restricted
to . This new model allows considerable flexibility in the design of
distribution testing algorithms; in particular, testing algorithms in this
model can be adaptive.
We study a wide range of natural distribution testing problems in this new
framework and some of its variants, giving both upper and lower bounds on query
complexity. These problems include testing whether is the uniform
distribution ; testing whether for an explicitly
provided ; testing whether two unknown distributions and
are equivalent; and estimating the variation distance between and the
uniform distribution. At a high level our main finding is that the new
"conditional sampling" framework we consider is a powerful one: while all the
problems mentioned above have sample complexity in the
standard model (and in some cases the complexity must be almost linear in ),
we give -query algorithms (and in some
cases -query algorithms independent of ) for
all these problems in our conditional sampling setting.
*Independently from our work, Chakraborty et al. also considered this
framework. We discuss their work in Subsection [1.4].Comment: Significant changes on Section 9 (detailing and expanding the proof
of Theorem 16). Several clarifications and typos fixed in various place
Testing Conditional Independence of Discrete Distributions
We study the problem of testing \emph{conditional independence} for discrete
distributions. Specifically, given samples from a discrete random variable on domain , we want to distinguish,
with probability at least , between the case that and are
conditionally independent given from the case that is
-far, in -distance, from every distribution that has this
property. Conditional independence is a concept of central importance in
probability and statistics with a range of applications in various scientific
domains. As such, the statistical task of testing conditional independence has
been extensively studied in various forms within the statistics and
econometrics communities for nearly a century. Perhaps surprisingly, this
problem has not been previously considered in the framework of distribution
property testing and in particular no tester with sublinear sample complexity
is known, even for the important special case that the domains of and
are binary.
The main algorithmic result of this work is the first conditional
independence tester with {\em sublinear} sample complexity for discrete
distributions over . To complement our upper
bounds, we prove information-theoretic lower bounds establishing that the
sample complexity of our algorithm is optimal, up to constant factors, for a
number of settings. Specifically, for the prototypical setting when , we show that the sample complexity of testing conditional
independence (upper bound and matching lower bound) is
\[
\Theta\left({\max\left(n^{1/2}/\epsilon^2,\min\left(n^{7/8}/\epsilon,n^{6/7}/\epsilon^{8/7}\right)\right)}\right)\,.
\
Learning circuits with few negations
Monotone Boolean functions, and the monotone Boolean circuits that compute
them, have been intensively studied in complexity theory. In this paper we
study the structure of Boolean functions in terms of the minimum number of
negations in any circuit computing them, a complexity measure that interpolates
between monotone functions and the class of all functions. We study this
generalization of monotonicity from the vantage point of learning theory,
giving near-matching upper and lower bounds on the uniform-distribution
learnability of circuits in terms of the number of negations they contain. Our
upper bounds are based on a new structural characterization of negation-limited
circuits that extends a classical result of A. A. Markov. Our lower bounds,
which employ Fourier-analytic tools from hardness amplification, give new
results even for circuits with no negations (i.e. monotone functions)
- …
