1,212 research outputs found
Equivalence of SQL Queries in Presence of Embedded Dependencies
We consider the problem of finding equivalent minimal-size reformulations of
SQL queries in presence of embedded dependencies [1]. Our focus is on
select-project-join (SPJ) queries with equality comparisons, also known as safe
conjunctive (CQ) queries, possibly with grouping and aggregation. For SPJ
queries, the semantics of the SQL standard treat query answers as multisets
(a.k.a. bags), whereas the stored relations may be treated either as sets,
which is called bag-set semantics for query evaluation, or as bags, which is
called bag semantics. (Under set semantics, both query answers and stored
relations are treated as sets.)
In the context of the above Query-Reformulation Problem, we develop a
comprehensive framework for equivalence of CQ queries under bag and bag-set
semantics in presence of embedded dependencies, and make a number of conceptual
and technical contributions. Specifically, we develop equivalence tests for CQ
queries in presence of arbitrary sets of embedded dependencies under bag and
bag-set semantics, under the condition that chase [9] under set semantics
(set-chase) on the inputs terminates. We also present equivalence tests for
aggregate CQ queries in presence of embedded dependencies. We use our
equivalence tests to develop sound and complete (whenever set-chase on the
inputs terminates) algorithms for solving instances of the Query-Reformulation
Problem with CQ queries under each of bag and bag-set semantics, as well as for
instances of the problem with aggregate queries.Comment: Correction of the previous version as described in the last sentence
of the Abstrac
WaveCluster with Differential Privacy
WaveCluster is an important family of grid-based clustering algorithms that
are capable of finding clusters of arbitrary shapes. In this paper, we
investigate techniques to perform WaveCluster while ensuring differential
privacy. Our goal is to develop a general technique for achieving differential
privacy on WaveCluster that accommodates different wavelet transforms. We show
that straightforward techniques based on synthetic data generation and
introduction of random noise when quantizing the data, though generally
preserving the distribution of data, often introduce too much noise to preserve
useful clusters. We then propose two optimized techniques, PrivTHR and
PrivTHREM, which can significantly reduce data distortion during two key steps
of WaveCluster: the quantization step and the significant grid identification
step. We conduct extensive experiments based on four datasets that are
particularly interesting in the context of clustering, and show that PrivTHR
and PrivTHREM achieve high utility when privacy budgets are properly allocated
- …
