Spark on entropy: A reliable & efficient scheduler for low-latency parallel jobs in heterogeneous cloud

@article{Chen2015SparkOE,
  title={Spark on entropy: A reliable & efficient scheduler for low-latency parallel jobs in heterogeneous cloud},
  author={Huankai Chen and Frank Z. Wang},
  journal={2015 IEEE 40th Local Computer Networks Conference Workshops (LCN Workshops)},
  year={2015},
  pages={708-713}
}
In heterogeneous cloud, the provision of quality of service (QoS) guarantees for on-line parallel analysis jobs is much more challenging than off-line ones, mainly due to the many involved parameters, unstable resource performance, various job pattern and dynamic query workload. In this paper we propose an entropy-based scheduling strategy for running the on-line parallel analysis as a service more reliable and efficient, and implement the proposed idea in Spark. Entropy, as a measure of the… CONTINUE READING

Similar Papers

Figures, Tables, Results, and Topics from this paper.

Key Quantitative Results

  • Experiments demonstrate that our approach significantly reduces the average query response time by 15% - 20% and standard deviation by 30% - 45% compare with the native Fair Scheduler in Spark.
  • On average, in this heterogeneous cluster experiment, Entropy Scheduler shorten the load testing completion time by 23%, reduce the average response time by 23% and standard deviation by 35%, and improve the overall server throughput by 30% compared with native Fair Scheduler.

Citations

Publications citing this paper.
SHOWING 1-10 OF 11 CITATIONS

Entropy4Cloud: Using Entropy-Based Complexity to Optimize Cloud Service Resource Management

  • IEEE Transactions on Emerging Topics in Computational Intelligence
  • 2018
VIEW 15 EXCERPTS
CITES METHODS
HIGHLY INFLUENCED

An energy-aware scheduling algorithm for big data applications in Spark

Hongjian Li, Huochen Wang, Shuyong Fang, Yang Zou, Wenhong Tian
  • Cluster Computing
  • 2019
VIEW 1 EXCERPT
CITES METHODS

Using information theory principles to schedule real-time tasks

  • 2017 51st Annual Conference on Information Sciences and Systems (CISS)
  • 2017
VIEW 1 EXCERPT
CITES BACKGROUND

A SLA-based Spark cluster scaling method in cloud environment

  • 2016 18th Asia-Pacific Network Operations and Management Symposium (APNOMS)
  • 2016
VIEW 2 EXCERPTS
CITES METHODS & BACKGROUND

An overview on cloud computing platform spark for Human Genome mining

  • 2016 IEEE International Conference on Mechatronics and Automation
  • 2016
VIEW 1 EXCERPT
CITES BACKGROUND

References

Publications referenced by this paper.
SHOWING 1-10 OF 21 REFERENCES

Performance Study of Spindle, A Web Analytics Query Engine Implemented in Spark.

Amos, Brandon, David Tompkins
  • Cloud Computing Technology and Science (CloudCom),
  • 2014

A Cost-Efficient and Reliable Resource Allocation Model Based on Cellular Automaton Entropy for Cloud Project Scheduling.

Chen, Huankai, Frank Z. Wang, Na Helian
  • International Journal of Advanced Computer Science and Applications
  • 2013
VIEW 1 EXCERPT

Wang , and Na Helian . ” A Cost - Efficient and Reliable Resource Allocation Model Based on Cellular Automaton Entropy for Cloud Project Scheduling

Huankai Chen, Z Frank
  • International Journal of Advanced Computer Science and Applications
  • 2013

A family of heuristics for agent-based elastic Cloud bag-of-tasks concurrent scheduling.

Gutierrez-Garcia, J. Octavio, Kwang Mong Sim
  • Future Generation Computer Systems
  • 2012
VIEW 2 EXCERPTS