1,192 research outputs found
Early Accurate Results for Advanced Analytics on MapReduce
Approximate results based on samples often provide the only way in which
advanced analytical applications on very massive data sets can satisfy their
time and resource constraints. Unfortunately, methods and tools for the
computation of accurate early results are currently not supported in
MapReduce-oriented systems although these are intended for `big data'.
Therefore, we proposed and implemented a non-parametric extension of Hadoop
which allows the incremental computation of early results for arbitrary
work-flows, along with reliable on-line estimates of the degree of accuracy
achieved so far in the computation. These estimates are based on a technique
called bootstrapping that has been widely employed in statistics and can be
applied to arbitrary functions and data distributions. In this paper, we
describe our Early Accurate Result Library (EARL) for Hadoop that was designed
to minimize the changes required to the MapReduce framework. Various tests of
EARL of Hadoop are presented to characterize the frequent situations where EARL
can provide major speed-ups over the current version of Hadoop.Comment: VLDB201
Towards More Data-Aware Application Integration (extended version)
Although most business application data is stored in relational databases,
programming languages and wire formats in integration middleware systems are
not table-centric. Due to costly format conversions, data-shipments and faster
computation, the trend is to "push-down" the integration operations closer to
the storage representation.
We address the alternative case of defining declarative, table-centric
integration semantics within standard integration systems. For that, we replace
the current operator implementations for the well-known Enterprise Integration
Patterns by equivalent "in-memory" table processing, and show a practical
realization in a conventional integration system for a non-reliable,
"data-intensive" messaging example. The results of the runtime analysis show
that table-centric processing is promising already in standard, "single-record"
message routing and transformations, and can potentially excel the message
throughput for "multi-record" table messages.Comment: 18 Pages, extended version of the contribution to British
International Conference on Databases (BICOD), 2015, Edinburgh, Scotlan
Declarative Algorithms in Datalog with Extrema: Their Formal Semantics Simplified
Recent advances are making possible the use of aggregates in recursive queries thus enabling the declarative expression classic algorithms and their efficient and scalable implementation. These advances rely the notion of Pre-Mappability (PreM) of constraints that, along with the seminaive-fixpoint operational semantics, guarantees formal non-monotonic semantics for recursive programs with min and max constraints. In this extended abstract, we introduce basic templates to simplify and automate task of proving PreM
- …
