153 research outputs found
Blowfish Privacy: Tuning Privacy-Utility Trade-offs using Policies
Privacy definitions provide ways for trading-off the privacy of individuals
in a statistical database for the utility of downstream analysis of the data.
In this paper, we present Blowfish, a class of privacy definitions inspired by
the Pufferfish framework, that provides a rich interface for this trade-off. In
particular, we allow data publishers to extend differential privacy using a
policy, which specifies (a) secrets, or information that must be kept secret,
and (b) constraints that may be known about the data. While the secret
specification allows increased utility by lessening protection for certain
individual properties, the constraint specification provides added protection
against an adversary who knows correlations in the data (arising from
constraints). We formalize policies and present novel algorithms that can
handle general specifications of sensitive information and certain count
constraints. We show that there are reasonable policies under which our privacy
mechanisms for k-means clustering, histograms and range queries introduce
significantly lesser noise than their differentially private counterparts. We
quantify the privacy-utility trade-offs for various policies analytically and
empirically on real datasets.Comment: Full version of the paper at SIGMOD'14 Snowbird, Utah US
An Analysis of Structured Data on the Web
In this paper, we analyze the nature and distribution of structured data on
the Web. Web-scale information extraction, or the problem of creating
structured tables using extraction from the entire web, is gathering lots of
research interest. We perform a study to understand and quantify the value of
Web-scale extraction, and how structured information is distributed amongst top
aggregator websites and tail sites for various interesting domains. We believe
this is the first study of its kind, and gives us new insights for information
extraction over the Web.Comment: VLDB201
- …
