Scalable Load Balancing in Networked Systems: Universality Properties and Stochastic Coupling Methods

@article{Boor2019ScalableLB,
  title={Scalable Load Balancing in Networked Systems: Universality Properties and Stochastic Coupling Methods},
  author={Mark van der Boor and Sem C. Borst and Johan van Leeuwaarden and Debankur Mukherjee},
  journal={ArXiv},
  year={2019},
  volume={abs/1712.08555}
}
We present an overview of scalable load balancing algorithms which provide favorable delay performance in large-scale systems, and yet only require minimal implementation overhead. Aimed at a broad audience, the paper starts with an introduction to the basic load balancing scenario, consisting of a single dispatcher where tasks arrive that must immediately be forwarded to one of $N$ single-server queues. A popular class of load balancing algorithms are so-called power-of-$d$ or JSQ($d… 

Figures and Tables from this paper

Scalable load balancing in networked systems

TLDR
An asymptotic regime where the diversity parameter d(N) depends on N is considered, and it is investigated what growth rate is required to match the optimal performance of the JSQ policy on fluid and diffusion scale, and achieve a vanishing waiting time in the limit.

Scalable load balancing in networked systems: A survey of recent advances

TLDR
It is demonstrated how Stochastic coupling techniques and stochastic-process limits play an instrumental role in establishing the asymptotic optimality and carries over to infinite-server settings, finite buffers, multiple dispatchers, servers arranged on graph topologies, and token-based load balancing including the popular Join-the-Idle-Queue (JIQ) scheme.

Optimal Hyper-Scalable Load Balancing with a Strict Queue Limit

Asymptotic Optimality of Speed-Aware JSQ for Heterogeneous Systems

TLDR
A speed-aware version of the JSQ scheme for heterogeneous systems and it is shown that it achieves delay optimality in the fluid limit and has a unique and globally attractive fixed point.

Load Balancing Under Strict Compatibility Constraints

TLDR
Proportionally sparse random compatibility graphs can be constructed, which reduce the server-degree almost by a factor N/ln(N) compared to the complete bipartite compatibility graph.

Steady-state analysis of load-balancing algorithms in the sub-Halfin-Whitt regime

TLDR
A sufficient condition under which the probability that an incoming job is routed to an idle server is 1 asymptotically (as $N \to \infty$) at steady state is established.

Join-Idle-Queue with Service Elasticity: Large-Scale Asymptotics of a Non-monotone System

TLDR
A novel method is developed to prove that the subcritically loaded system is stable for large enough $N$, and establish convergence of steady-state distributions to the optimal one, as $N \to \infty$.

Join-the-Shortest Queue diffusion limit in Halfin–Whitt regime: Sensitivity on the heavy-traffic parameter

Consider a system of $N$ parallel single-server queues with unit-exponential service time distribution and a single dispatcher where tasks arrive as a Poisson process of rate $\lambda(N)$. When a

Join-the-shortest queue diffusion limit in Halfin–Whitt regime: Tail asymptotics and scaling of extrema

Consider a system of $N$ parallel single-server queues with unit-exponential service time distribution and a single dispatcher where tasks arrive as a Poisson process of rate $\lambda(N)$. When a

Economies-of-scale in resource sharing systems: tutorial and partial review of the QED heavy-traffic regime

TLDR
The mathematics behind the Quality-and-Efficiency Driven (QED) regime, which lets the system operate close to full utilization, while the number of servers grows simultaneously large and delays remain manageable, is reviewed.

References

SHOWING 1-10 OF 51 REFERENCES

Universality of load balancing schemes on the diffusion scale

TLDR
A stochastic coupling construction is developed to obtain the diffusion limit of the queue process in the Halfin‒Whitt heavy-traffic regime, and it is established that it does not depend on the value of d, implying that assigning tasks to idle servers is sufficient for diffusion level optimality.

Asymptotic Optimality of Power-of-d Load Balancing in Large-Scale Systems

TLDR
The results indicate that the JSQ optimality can be preserved at the fluid and diffusion levels while reducing the overhead by nearly a factor O(N) and O([Formula: see text]), respectively.

Load Balancing in the Non-Degenerate Slowdown Regime

TLDR
A novel diffusion approximation and timescale separation is identified that provides insights into the performance of Join-the-Shortest-Queue and leads to new rules that have identical performance to JSQ but require less communication overhead than power-of-2-choices.

Delay, Memory, and Messaging Tradeoffs in Distributed Service Systems

TLDR
It is shown that for any given α>0 (no matter how small), if the policy only uses a linear message rate α N, the resulting asymptotic expected queueing delay is positive but upper bounded, uniformly over all λ>1.

Randomized load balancing in heavy tra c

We consider three randomized schemes for load balancing among a xed number, N , of resources, and analyze them in the heavy tra c limit. The rst is join the shortest queue, where arrivals are routed

Hyper-Scalable JSQ with Sparse Feedback

TLDR
A novel class of load balancing schemes where the various servers provide occasional queue updates to guide the load assignment is introduced, and it is shown that the proposed schemes strongly outperform JSQ(d) strategies with comparable communication overhead per job, and can achieve a vanishing waiting time in the many-server limit with just one message per job.

Asymptotically Optimal Load Balancing Topologies

TLDR
It is proved that if G N is an Erdos-Rényi random graph with average degree d(N), then with high probability it is N -optimal and ∞N-optimal if d (N) -> ınfty$ and d( N) / (∞N łog(N)) -> Turkishnfty as N -> Istanbulnfty, respectively.

Pull-based load distribution among heterogeneous parallel servers: the case of multiple routers

  • A. Stolyar
  • Computer Science
    Queueing Syst. Theory Appl.
  • 2017
TLDR
A multi-router generalization of the pull-based customer assignment (routing) algorithm PULL, introduced in Stolyar in 2015, is defined and proved asymptotic optimality of PULL is proved: as narrows, the steady-state probability of an arriving customer experiencing blocking or waiting vanishes.

Pull-based load distribution in large-scale heterogeneous service systems

  • A. Stolyar
  • Computer Science
    Queueing Syst. Theory Appl.
  • 2015
TLDR
Assuming subcritical system load, it is proved asymptotic optimality of PULL, the steady-state probability of an arriving customer experiencing blocking or waiting, vanishes as system scale.
...