1,212 research outputs found

    Equivalence of SQL Queries in Presence of Embedded Dependencies

    Full text link
    We consider the problem of finding equivalent minimal-size reformulations of SQL queries in presence of embedded dependencies [1]. Our focus is on select-project-join (SPJ) queries with equality comparisons, also known as safe conjunctive (CQ) queries, possibly with grouping and aggregation. For SPJ queries, the semantics of the SQL standard treat query answers as multisets (a.k.a. bags), whereas the stored relations may be treated either as sets, which is called bag-set semantics for query evaluation, or as bags, which is called bag semantics. (Under set semantics, both query answers and stored relations are treated as sets.) In the context of the above Query-Reformulation Problem, we develop a comprehensive framework for equivalence of CQ queries under bag and bag-set semantics in presence of embedded dependencies, and make a number of conceptual and technical contributions. Specifically, we develop equivalence tests for CQ queries in presence of arbitrary sets of embedded dependencies under bag and bag-set semantics, under the condition that chase [9] under set semantics (set-chase) on the inputs terminates. We also present equivalence tests for aggregate CQ queries in presence of embedded dependencies. We use our equivalence tests to develop sound and complete (whenever set-chase on the inputs terminates) algorithms for solving instances of the Query-Reformulation Problem with CQ queries under each of bag and bag-set semantics, as well as for instances of the problem with aggregate queries.Comment: Correction of the previous version as described in the last sentence of the Abstrac

    WaveCluster with Differential Privacy

    Full text link
    WaveCluster is an important family of grid-based clustering algorithms that are capable of finding clusters of arbitrary shapes. In this paper, we investigate techniques to perform WaveCluster while ensuring differential privacy. Our goal is to develop a general technique for achieving differential privacy on WaveCluster that accommodates different wavelet transforms. We show that straightforward techniques based on synthetic data generation and introduction of random noise when quantizing the data, though generally preserving the distribution of data, often introduce too much noise to preserve useful clusters. We then propose two optimized techniques, PrivTHR and PrivTHREM, which can significantly reduce data distortion during two key steps of WaveCluster: the quantization step and the significant grid identification step. We conduct extensive experiments based on four datasets that are particularly interesting in the context of clustering, and show that PrivTHR and PrivTHREM achieve high utility when privacy budgets are properly allocated
    corecore