Hopper: Decentralized Speculation-aware Cluster Scheduling at Scale

@inproceedings{Ren2015HopperDS,
  title={Hopper: Decentralized Speculation-aware Cluster Scheduling at Scale},
  author={Xiaoqi Ren and Ganesh Ananthanarayanan and Adam Wierman and Minlan Yu},
  booktitle={SIGCOMM},
  year={2015}
}
As clusters continue to grow in size and complexity, providing scalable and predictable performance is an increasingly important challenge. A crucial roadblock to achieving predictable performance is stragglers, i.e., tasks that take significantly longer than expected to run. At this point, speculative execution has been widely adopted to mitigate the impact of stragglers. However, speculation mechanisms are designed and operated independently of job scheduling when, in fact, scheduling a… CONTINUE READING

Figures, Results, and Topics from this paper.

Key Quantitative Results

  • We implement both centralized and decentralized prototypes of the Hopper scheduler and show that 50% (66%) improvements over state-of-the-art centralized (decentralized) schedulers and speculation strategies can be achieved through the coordination of scheduling and speculation.
  • We imple­ment both centralized and decentralized prototypes of the Hopper scheduler and show that 50% (66%) improve­ments over state-of-the-art centralized (decentralized) schedulers and speculation strategies can be achieved through the coordination of scheduling and speculation.
  • The de­centralized and centralized implementations of Hopper reduce the average job completion time by up to 66% and 50% compared to state-of-the-art scheduling and straggler mitigation techniques.
  • We deploy our prototypes (built in Sparrow [36], Spark [49] and Hadoop [3]) on a 200 machine cluster, and see job speed ups of 66% in decen­tralized settings and 50% in centralized settings com­pared to current state-of-the-art schedulers.