74 research outputs found

    InterPoll: Crowd-Sourced Internet Polls

    Get PDF
    Crowd-sourcing is increasingly being used to provide answers to online polls and surveys. However, existing systems, while taking care of the mechanics of attracting crowd workers, poll building, and payment, provide little to help the survey-maker or pollster in obtaining statistically significant results devoid of even the obvious selection biases. This paper proposes InterPoll, a platform for programming of crowd-sourced polls. Pollsters express polls as embedded LINQ queries and the runtime correctly reasons about uncertainty in those polls, only polling as many people as required to meet statistical guarantees. To optimize the cost of polls, InterPoll performs query optimization, as well as bias correction and power analysis. The goal of InterPoll is to provide a system that can be reliably used for research into marketing, social and political science questions. This paper highlights some of the existing challenges and how InterPoll is designed to address most of them. In this paper we summarize some of the work we have already done and give an outline for future work

    Jumping the ORDER BY Barrier in Large-Scale Pattern Matching

    Get PDF
    Event-series pattern matching is a major component of large-scale data analytics pipelines enabling a wide range of system diagnostics tasks. A precursor to pattern matching is an expensive ``shuffle the world'' stage wherein data are ordered by time and shuffled across the network. Because many existing systems treat the pattern matching engine as a black box, they are unable to optimizing the entire data analytics pipeline, and in particular, this costly shuffle. This paper demonstrates how to optimize such queries. We first translate an expressive class of regular-expression like patterns to relational queries such that they can benefit from decades of progress in relational optimizers, and then we introduce the technique of abstract pattern matching, a linear time preprocessing step which, adapting ideas from symbolic execution and abstract interpretation, discards events from the input guaranteed not to appear in successful matches. Abstract pattern matching first computes a conservative representation of the output-relevant domain of every transition in a pattern based on the (unary) predicates of that transition. It then further refines these domains based on the structure of the pattern (i.e., paths through the pattern) as well as any of the pattern's join predicates across transitions. The outcome is an abstract filter that when applied to the original stream excludes events that are guaranteed not to participate in a match. We implemented and applied abstract pattern matching in COSMOS/Scope to an industrial benchmark where we obtained up to 3 orders of magnitude reduction in shuffled data and 1.23x average speedup in total processing time
    corecore