1,192 research outputs found

    Early Accurate Results for Advanced Analytics on MapReduce

    Full text link
    Approximate results based on samples often provide the only way in which advanced analytical applications on very massive data sets can satisfy their time and resource constraints. Unfortunately, methods and tools for the computation of accurate early results are currently not supported in MapReduce-oriented systems although these are intended for `big data'. Therefore, we proposed and implemented a non-parametric extension of Hadoop which allows the incremental computation of early results for arbitrary work-flows, along with reliable on-line estimates of the degree of accuracy achieved so far in the computation. These estimates are based on a technique called bootstrapping that has been widely employed in statistics and can be applied to arbitrary functions and data distributions. In this paper, we describe our Early Accurate Result Library (EARL) for Hadoop that was designed to minimize the changes required to the MapReduce framework. Various tests of EARL of Hadoop are presented to characterize the frequent situations where EARL can provide major speed-ups over the current version of Hadoop.Comment: VLDB201

    Towards More Data-Aware Application Integration (extended version)

    Full text link
    Although most business application data is stored in relational databases, programming languages and wire formats in integration middleware systems are not table-centric. Due to costly format conversions, data-shipments and faster computation, the trend is to "push-down" the integration operations closer to the storage representation. We address the alternative case of defining declarative, table-centric integration semantics within standard integration systems. For that, we replace the current operator implementations for the well-known Enterprise Integration Patterns by equivalent "in-memory" table processing, and show a practical realization in a conventional integration system for a non-reliable, "data-intensive" messaging example. The results of the runtime analysis show that table-centric processing is promising already in standard, "single-record" message routing and transformations, and can potentially excel the message throughput for "multi-record" table messages.Comment: 18 Pages, extended version of the contribution to British International Conference on Databases (BICOD), 2015, Edinburgh, Scotlan

    Declarative Algorithms in Datalog with Extrema: Their Formal Semantics Simplified

    Get PDF
    Recent advances are making possible the use of aggregates in recursive queries thus enabling the declarative expression classic algorithms and their efficient and scalable implementation. These advances rely the notion of Pre-Mappability (PreM) of constraints that, along with the seminaive-fixpoint operational semantics, guarantees formal non-monotonic semantics for recursive programs with min and max constraints. In this extended abstract, we introduce basic templates to simplify and automate task of proving PreM
    corecore