551 research outputs found
OS-Assisted Task Preemption for Hadoop
This work introduces a new task preemption primitive for Hadoop, that allows
tasks to be suspended and resumed exploiting existing memory management
mechanisms readily available in modern operating systems. Our technique fills
the gap that exists between the two extremes cases of killing tasks (which
waste work) or waiting for their completion (which introduces latency):
experimental results indicate superior performance and very small overheads
when compared to existing alternatives
PSBS: Practical Size-Based Scheduling
Size-based schedulers have very desirable performance properties: optimal or
near-optimal response time can be coupled with strong fairness guarantees.
Despite this, such systems are very rarely implemented in practical settings,
because they require knowing a priori the amount of work needed to complete
jobs: this assumption is very difficult to satisfy in concrete systems. It is
definitely more likely to inform the system with an estimate of the job sizes,
but existing studies point to somewhat pessimistic results if existing
scheduler policies are used based on imprecise job size estimations. We take
the goal of designing scheduling policies that are explicitly designed to deal
with inexact job sizes: first, we show that existing size-based schedulers can
have bad performance with inexact job size information when job sizes are
heavily skewed; we show that this issue, and the pessimistic results shown in
the literature, are due to problematic behavior when large jobs are
underestimated. Once the problem is identified, it is possible to amend
existing size-based schedulers to solve the issue. We generalize FSP -- a fair
and efficient size-based scheduling policy -- in order to solve the problem
highlighted above; in addition, our solution deals with different job weights
(that can be assigned to a job independently from its size). We provide an
efficient implementation of the resulting protocol, which we call Practical
Size-Based Scheduler (PSBS). Through simulations evaluated on synthetic and
real workloads, we show that PSBS has near-optimal performance in a large
variety of cases with inaccurate size information, that it performs fairly and
it handles correctly job weights. We believe that this work shows that PSBS is
indeed pratical, and we maintain that it could inspire the design of schedulers
in a wide array of real-world use cases.Comment: arXiv admin note: substantial text overlap with arXiv:1403.599
Cloud-based Content Distribution on a Budget
To leverage the elastic nature of cloud computing, a solution provider must be able to accurately gauge demand for its offering. For applications that involve swarm-to-cloud interactions, gauging such demand is not straightforward. In this paper, we propose a general framework, analyze a mathematical model, and present a prototype implementation of a canonical swarm-to-cloud application, namely peer-assisted content delivery. Our system – called Cyclops – dynamically adjusts the off-cloud bandwidth consumed by content servers (which represents the bulk of the provider's cost) to feed a set of swarming clients, based on a feedback signal that gauges the real-time health of the swarm. Our extensive evaluation of Cyclops in a variety of settings – including controlled PlanetLab and live Internet experiments involving thousands of users – show significant reduction in content distribution costs (by as much as two orders of magnitude) when compared to non-feedback-based swarming solutions, with minor impact on content delivery times
Experimental Performance Evaluation of Cloud-Based Analytics-as-a-Service
An increasing number of Analytics-as-a-Service solutions has recently seen
the light, in the landscape of cloud-based services. These services allow
flexible composition of compute and storage components, that create powerful
data ingestion and processing pipelines. This work is a first attempt at an
experimental evaluation of analytic application performance executed using a
wide range of storage service configurations. We present an intuitive notion of
data locality, that we use as a proxy to rank different service compositions in
terms of expected performance. Through an empirical analysis, we dissect the
performance achieved by analytic workloads and unveil problems due to the
impedance mismatch that arise in some configurations. Our work paves the way to
a better understanding of modern cloud-based analytic services and their
performance, both for its end-users and their providers.Comment: Longer version of the paper in Submission at IEEE CLOUD'1
- …
