379 research outputs found
Delay versus Stickiness Violation Trade-offs for Load Balancing in Large-Scale Data Centers
Most load balancing techniques implemented in current data centers tend to
rely on a mapping from packets to server IP addresses through a hash value
calculated from the flow five-tuple. The hash calculation allows extremely fast
packet forwarding and provides flow `stickiness', meaning that all packets
belonging to the same flow get dispatched to the same server. Unfortunately,
such static hashing may not yield an optimal degree of load balancing, e.g.,
due to variations in server processing speeds or traffic patterns. On the other
hand, dynamic schemes, such as the Join-the-Shortest-Queue (JSQ) scheme,
provide a natural way to mitigate load imbalances, but at the expense of
stickiness violation.
In the present paper we examine the fundamental trade-off between stickiness
violation and packet-level latency performance in large-scale data centers. We
establish that stringent flow stickiness carries a significant performance
penalty in terms of packet-level delay. Moreover, relaxing the stickiness
requirement by a minuscule amount is highly effective in clipping the tail of
the latency distribution. We further propose a bin-based load balancing scheme
that achieves a good balance among scalability, stickiness violation and
packet-level delay performance. Extensive simulation experiments corroborate
the analytical results and validate the effectiveness of the bin-based load
balancing scheme
Lingering Issues in Distributed Scheduling
Recent advances have resulted in queue-based algorithms for medium access
control which operate in a distributed fashion, and yet achieve the optimal
throughput performance of centralized scheduling algorithms. However,
fundamental performance bounds reveal that the "cautious" activation rules
involved in establishing throughput optimality tend to produce extremely large
delays, typically growing exponentially in 1/(1-r), with r the load of the
system, in contrast to the usual linear growth.
Motivated by that issue, we explore to what extent more "aggressive" schemes
can improve the delay performance. Our main finding is that aggressive
activation rules induce a lingering effect, where individual nodes retain
possession of a shared resource for excessive lengths of time even while a
majority of other nodes idle. Using central limit theorem type arguments, we
prove that the idleness induced by the lingering effect may cause the delays to
grow with 1/(1-r) at a quadratic rate. To the best of our knowledge, these are
the first mathematical results illuminating the lingering effect and
quantifying the performance impact.
In addition extensive simulation experiments are conducted to illustrate and
validate the various analytical results
Queue-Based Random-Access Algorithms: Fluid Limits and Stability Issues
We use fluid limits to explore the (in)stability properties of wireless
networks with queue-based random-access algorithms. Queue-based random-access
schemes are simple and inherently distributed in nature, yet provide the
capability to match the optimal throughput performance of centralized
scheduling mechanisms in a wide range of scenarios. Unfortunately, the type of
activation rules for which throughput optimality has been established, may
result in excessive queue lengths and delays. The use of more
aggressive/persistent access schemes can improve the delay performance, but
does not offer any universal maximum-stability guarantees. In order to gain
qualitative insight and investigate the (in)stability properties of more
aggressive/persistent activation rules, we examine fluid limits where the
dynamics are scaled in space and time. In some situations, the fluid limits
have smooth deterministic features and maximum stability is maintained, while
in other scenarios they exhibit random oscillatory characteristics, giving rise
to major technical challenges. In the latter regime, more aggressive access
schemes continue to provide maximum stability in some networks, but may cause
instability in others. Simulation experiments are conducted to illustrate and
validate the analytical results
Exact asymptotics for fluid queues fed by multiple heavy-tailed on-off flows
We consider a fluid queue fed by multiple On-Off flows with heavy-tailed
(regularly varying) On periods. Under fairly mild assumptions, we prove that
the workload distribution is asymptotically equivalent to that in a reduced
system. The reduced system consists of a ``dominant'' subset of the flows, with
the original service rate subtracted by the mean rate of the other flows. We
describe how a dominant set may be determined from a simple knapsack
formulation. The dominant set consists of a ``minimally critical'' set of
On-Off flows with regularly varying On periods. In case the dominant set
contains just a single On-Off flow, the exact asymptotics for the reduced
system follow from known results. For the case of several
On-Off flows, we exploit a powerful intuitive argument to obtain the exact
asymptotics. Combined with the reduced-load equivalence, the results for the
reduced system provide a characterization of the tail of the workload
distribution for a wide range of traffic scenarios
Delay Performance and Mixing Times in Random-Access Networks
We explore the achievable delay performance in wireless random-access
networks. While relatively simple and inherently distributed in nature,
suitably designed queue-based random-access schemes provide the striking
capability to match the optimal throughput performance of centralized
scheduling mechanisms in a wide range of scenarios. The specific type of
activation rules for which throughput optimality has been established, may
however yield excessive queues and delays.
Motivated by that issue, we examine whether the poor delay performance is
inherent to the basic operation of these schemes, or caused by the specific
kind of activation rules. We derive delay lower bounds for queue-based
activation rules, which offer fundamental insight in the cause of the excessive
delays. For fixed activation rates we obtain lower bounds indicating that
delays and mixing times can grow dramatically with the load in certain
topologies as well
GPS queues with heterogeneous traffic classes
We consider a queue fed by a mixture of light-tailed and heavy-tailed traffic. The two traffic classes are served in accordance with the generalized processor sharing (GPS) discipline. GPS-based scheduling algorithms, such as weighted fair queueing (WFQ), have emerged as an important mechanism for achieving service differentiation in integrated networks. We derive the asymptotic workload behavior of the light-tailed class for the situation where its GPS weight is larger than its traffic intensity. The GPS mechanism ensures that the workload is bounded above by that in an isolated system with the light-tailed class served in isolation at a constant rate equal to its GPS weight. We show that the workload distribution is in fact asymptotically equivalent to that in the isolated system, multiplied with a certain pre-factor, which accounts for the interaction with the heavy-tailed class. Specifically, the pre-factor represents the probability that the heavy-tailed class is backlogged long enough for the light-tailed class to reach overflow. The results provide crucial qualitative insight in the typical overflow scenario
Load Balancing in Large-Scale Systems with Multiple Dispatchers
Load balancing algorithms play a crucial role in delivering robust
application performance in data centers and cloud networks. Recently, strong
interest has emerged in Join-the-Idle-Queue (JIQ) algorithms, which rely on
tokens issued by idle servers in dispatching tasks and outperform power-of-
policies. Specifically, JIQ strategies involve minimal information exchange,
and yet achieve zero blocking and wait in the many-server limit. The latter
property prevails in a multiple-dispatcher scenario when the loads are strictly
equal among dispatchers. For various reasons it is not uncommon however for
skewed load patterns to occur. We leverage product-form representations and
fluid limits to establish that the blocking and wait then no longer vanish,
even for arbitrarily low overall load. Remarkably, it is the least-loaded
dispatcher that throttles tokens and leaves idle servers stranded, thus acting
as bottleneck.
Motivated by the above issues, we introduce two enhancements of the ordinary
JIQ scheme where tokens are either distributed non-uniformly or occasionally
exchanged among the various dispatchers. We prove that these extensions can
achieve zero blocking and wait in the many-server limit, for any subcritical
overall load and arbitrarily skewed load profiles. Extensive simulation
experiments demonstrate that the asymptotic results are highly accurate, even
for moderately sized systems
Hyper-Scalable JSQ with Sparse Feedback
Load balancing algorithms play a vital role in enhancing performance in data
centers and cloud networks. Due to the massive size of these systems,
scalability challenges, and especially the communication overhead associated
with load balancing mechanisms, have emerged as major concerns. Motivated by
these issues, we introduce and analyze a novel class of load balancing schemes
where the various servers provide occasional queue updates to guide the load
assignment.
We show that the proposed schemes strongly outperform JSQ() strategies
with comparable communication overhead per job, and can achieve a vanishing
waiting time in the many-server limit with just one message per job, just like
the popular JIQ scheme. The proposed schemes are particularly geared however
towards the sparse feedback regime with less than one message per job, where
they outperform corresponding sparsified JIQ versions.
We investigate fluid limits for synchronous updates as well as asynchronous
exponential update intervals. The fixed point of the fluid limit is identified
in the latter case, and used to derive the queue length distribution. We also
demonstrate that in the ultra-low feedback regime the mean stationary waiting
time tends to a constant in the synchronous case, but grows without bound in
the asynchronous case
Achievable Performance in Product-Form Networks
We characterize the achievable range of performance measures in product-form
networks where one or more system parameters can be freely set by a network
operator. Given a product-form network and a set of configurable parameters, we
identify which performance measures can be controlled and which target values
can be attained. We also discuss an online optimization algorithm, which allows
a network operator to set the system parameters so as to achieve target
performance metrics. In some cases, the algorithm can be implemented in a
distributed fashion, of which we give several examples. Finally, we give
conditions that guarantee convergence of the algorithm, under the assumption
that the target performance metrics are within the achievable range.Comment: 50th Annual Allerton Conference on Communication, Control and
Computing - 201
- …
