Search CORE

1,211 research outputs found

Sampling Correctors

Author: Canonne Clément
Gouleakis Themis
Rubinfeld Ronitt
Publication venue
Publication date: 31/03/2018
Field of study

In many situations, sample data is obtained from a noisy or imperfect source. In order to address such corruptions, this paper introduces the concept of a sampling corrector. Such algorithms use structure that the distribution is purported to have, in order to allow one to make "on-the-fly" corrections to samples drawn from probability distributions. These algorithms then act as filters between the noisy data and the end user. We show connections between sampling correctors, distribution learning algorithms, and distribution property testing algorithms. We show that these connections can be utilized to expand the applicability of known distribution learning and property testing algorithms as well as to achieve improved algorithms for those tasks. As a first step, we show how to design sampling correctors using proper learning algorithms. We then focus on the question of whether algorithms for sampling correctors can be more efficient in terms of sample complexity than learning algorithms for the analogous families of distributions. When correcting monotonicity, we show that this is indeed the case when also granted query access to the cumulative distribution function. We also obtain sampling correctors for monotonicity without this stronger type of access, provided that the distribution be originally very close to monotone (namely, at a distance

O(1/\log^2 n)

). In addition to that, we consider a restricted error model that aims at capturing "missing data" corruptions. In this model, we show that distributions that are close to monotone have sampling correctors that are significantly more efficient than achievable by the learning approach. We also consider the question of whether an additional source of independent random bits is required by sampling correctors to implement the correction process

arXiv.org e-Print Archive

DSpace@MIT

Testing probability distributions using conditional samples

Author: Canonne Clement
Ron Dana
Servedio Rocco A.
Publication venue
Publication date: 01/01/2015
Field of study

We study a new framework for property testing of probability distributions, by considering distribution testing algorithms that have access to a conditional sampling oracle.* This is an oracle that takes as input a subset

S \subseteq [N]

of the domain

[N]

of the unknown probability distribution

D

and returns a draw from the conditional probability distribution

D

restricted to

S

. This new model allows considerable flexibility in the design of distribution testing algorithms; in particular, testing algorithms in this model can be adaptive. We study a wide range of natural distribution testing problems in this new framework and some of its variants, giving both upper and lower bounds on query complexity. These problems include testing whether

D

is the uniform distribution

\mathcal{U}

; testing whether

D = D^\ast

for an explicitly provided

D^\ast

; testing whether two unknown distributions

D_1

and

D_2

are equivalent; and estimating the variation distance between

D

and the uniform distribution. At a high level our main finding is that the new "conditional sampling" framework we consider is a powerful one: while all the problems mentioned above have

\Omega(\sqrt{N})

sample complexity in the standard model (and in some cases the complexity must be almost linear in

N

), we give

\mathrm{poly}(\log N, 1/\varepsilon)

-query algorithms (and in some cases

\mathrm{poly}(1/\varepsilon)

-query algorithms independent of

N

) for all these problems in our conditional sampling setting. *Independently from our work, Chakraborty et al. also considered this framework. We discuss their work in Subsection [1.4].Comment: Significant changes on Section 9 (detailing and expanding the proof of Theorem 16). Several clarifications and typos fixed in various place

arXiv.org e-Print Archive

Crossref

Testing Conditional Independence of Discrete Distributions

Author: Acharya J.
Batu T.
Bayesian Square Hellinger
Canonne C.
Canonne C. L.
Canonne C. L.
Closeness Near-Optimal
Fisher R. A.
Hardt M.
Mantel N.
Publication venue
Publication date: 11/02/2018
Field of study

We study the problem of testing \emph{conditional independence} for discrete distributions. Specifically, given samples from a discrete random variable

(X, Y, Z)

on domain

[\ell_1]\times[\ell_2] \times [n]

, we want to distinguish, with probability at least

2/3

, between the case that

X

and

Y

are conditionally independent given

Z

from the case that

(X, Y, Z)

\epsilon

-far, in

\ell_1

-distance, from every distribution that has this property. Conditional independence is a concept of central importance in probability and statistics with a range of applications in various scientific domains. As such, the statistical task of testing conditional independence has been extensively studied in various forms within the statistics and econometrics communities for nearly a century. Perhaps surprisingly, this problem has not been previously considered in the framework of distribution property testing and in particular no tester with sublinear sample complexity is known, even for the important special case that the domains of

X

and

Y

are binary. The main algorithmic result of this work is the first conditional independence tester with {\em sublinear} sample complexity for discrete distributions over

[\ell_1]\times[\ell_2] \times [n]

. To complement our upper bounds, we prove information-theoretic lower bounds establishing that the sample complexity of our algorithm is optimal, up to constant factors, for a number of settings. Specifically, for the prototypical setting when

\ell_1, \ell_2 = O(1)

, we show that the sample complexity of testing conditional independence (upper bound and matching lower bound) is \[ \Theta\left({\max\left(n^{1/2}/\epsilon^2,\min\left(n^{7/8}/\epsilon,n^{6/7}/\epsilon^{8/7}\right)\right)}\right)\,. \

arXiv.org e-Print Archive

Crossref

eScholarship - University of California

Learning circuits with few negations

Author: Blais Eric
Canonne Clément L.
Oliveira Igor C.
Servedio Rocco A.
Tan Li-Yang
Publication venue
Publication date: 30/10/2014
Field of study

Monotone Boolean functions, and the monotone Boolean circuits that compute them, have been intensively studied in complexity theory. In this paper we study the structure of Boolean functions in terms of the minimum number of negations in any circuit computing them, a complexity measure that interpolates between monotone functions and the class of all functions. We study this generalization of monotonicity from the vantage point of learning theory, giving near-matching upper and lower bounds on the uniform-distribution learnability of circuits in terms of the number of negations they contain. Our upper bounds are based on a new structural characterization of negation-limited circuits that extends a classical result of A. A. Markov. Our lower bounds, which employ Fourier-analytic tools from hardness amplification, give new results even for circuits with no negations (i.e. monotone functions)

arXiv.org e-Print Archive

CiteSeerX

DROPS Dagstuhl Research Online Publication Server

Warwick Research Archives Portal Repository