Failure Analysis of Hadoop Schedulers using an Integration of Model Checking and Simulation

  title={Failure Analysis of Hadoop Schedulers using an Integration of Model Checking and Simulation},
  author={Mbarka Soualhia and Foutse Khomh and Sofi{\`e}ne Tahar},
The Hadoop scheduler is a centerpiece of Hadoop, the leading processing framework for data-intensive applications in the cloud. Given the impact of failures on the performance of applications running on Hadoop, testing and verifying the performance of the Hadoop scheduler is critical. Existing approaches such as performance simulation and analytical modeling are inadequate because they are not able to ascertain a complete verification of a Hadoop scheduler. This is due to the wide range of… Expand

Figures and Tables from this paper


Towards Formal Modeling and Verification of Cloud Architectures: A Case Study on Hadoop
This paper proposes a holistic approach to verify the correctness of hadoop systems using model checking techniques and model Hadoop's parallel architecture to constraint it to valid start up ordering and identify and prove the benefits of data locality, deadlock-freeness and non-termination among others. Expand
Understanding the effects and implications of compute node related failures in hadoop
This paper analyzes Hadoop's behavior under failures involving compute nodes and finds that even a single failure can result in inflated, variable and unpredictable job running times, all undesirable properties in a distributed system. Expand
Petri Nets Formalization of Map/Reduce Paradigm to Optimise the Performance-Cost Tradeoff
A formalization of the Map/Reduce paradigm is presented which is used to evaluate performance parameters and make a trade-off analysis of the number of workers versus processing time and resource cost and results show that the proposed model enables to determine in advance both the performance of a Map/ Reduce-based application within Cloud environments and the best performance-cost agreement. Expand
Using Coq in Specification and Program Extraction of Hadoop MapReduce Applications
The goal of this research is to verify actual running code of MapReduce applications that actually run on the Hadoop Map Reduce framework, and the feasibility of two approaches was investigated. Expand
Modeling and Verifying HDFS Using CSP
This paper uses Communicating Sequential Processes (CSP) to model and analyze HDFS, and focuses on the dominant parts which include reading files and writing files in HDFS and formalize them in detail. Expand
Performance Analysis Using Petri Net Based MapReduce Model in Heterogeneous Clusters
SPN-MR simulates the elapsed time of any MapReduce jobs with known input data sizes and then reduces time cost in performance tuning and can provide effective performance evaluation reports for MapRed reduce programmers. Expand
Task Scheduling in Big Data Platforms: A Systematic Literature Review
This SLR analyses the design decisions of different scheduling models proposed in the literature for Hadoop, Spark, Storm, and Mesos over the period between 2005 and 2016 and provides a research taxonomy for succinct classification of these scheduling models. Expand
Real-time systems - scheduling, analysis, and verification
Real-Time Systems: Scheduling, Analysis, and Verification provides a substantial, up-to-date overview of the verification and validation process of real-time systems. Expand
WOHA: Deadline-Aware Map-Reduce Workflow Scheduling Framework over Hadoop Clusters
WOHA is proposed, an efficient scheduling framework for deadline-aware Map-Reduce workflows that improves deadline satisfaction ratio by dynamically assigning priorities among workflows based on their progresses and scales up to tens of thousands of concurrently running workflows. Expand
RAFTing MapReduce: Fast recovery on the RAFT
This paper proposes a family of Recovery Algorithms for Fast-Tracking (RAFT) MapReduce and implemented RAFT on top of Hadoop and evaluated it on a 45-node cluster using three common analytical tasks. Expand