Failure Analysis of Hadoop Schedulers using an Integration of Model Checking and Simulation

  title={Failure Analysis of Hadoop Schedulers using an Integration of Model Checking and Simulation},
  author={Mbarka Soualhia and Foutse Khomh and Sofi{\`e}ne Tahar},
The Hadoop scheduler is a centerpiece of Hadoop, the leading processing framework for data-intensive applications in the cloud. Given the impact of failures on the performance of applications running on Hadoop, testing and verifying the performance of the Hadoop scheduler is critical. Existing approaches such as performance simulation and analytical modeling are inadequate because they are not able to ascertain a complete verification of a Hadoop scheduler. This is due to the wide range of… 

Figures and Tables from this paper



Towards Formal Modeling and Verification of Cloud Architectures: A Case Study on Hadoop

This paper proposes a holistic approach to verify the correctness of hadoop systems using model checking techniques and model Hadoop's parallel architecture to constraint it to valid start up ordering and identify and prove the benefits of data locality, deadlock-freeness and non-termination among others.

Understanding the effects and implications of compute node related failures in hadoop

This paper analyzes Hadoop's behavior under failures involving compute nodes and finds that even a single failure can result in inflated, variable and unpredictable job running times, all undesirable properties in a distributed system.

Petri Nets Formalization of Map/Reduce Paradigm to Optimise the Performance-Cost Tradeoff

A formalization of the Map/Reduce paradigm is presented which is used to evaluate performance parameters and make a trade-off analysis of the number of workers versus processing time and resource cost and results show that the proposed model enables to determine in advance both the performance of a Map/ Reduce-based application within Cloud environments and the best performance-cost agreement.

Using Coq in Specification and Program Extraction of Hadoop MapReduce Applications

The goal of this research is to verify actual running code of MapReduce applications that actually run on the Hadoop Map Reduce framework, and the feasibility of two approaches was investigated.

Modeling and Verifying HDFS Using CSP

This paper uses Communicating Sequential Processes (CSP) to model and analyze HDFS, and focuses on the dominant parts which include reading files and writing files in HDFS and formalize them in detail.

Performance Analysis Using Petri Net Based MapReduce Model in Heterogeneous Clusters

SPN-MR simulates the elapsed time of any MapReduce jobs with known input data sizes and then reduces time cost in performance tuning and can provide effective performance evaluation reports for MapRed reduce programmers.

Task Scheduling in Big Data Platforms: A Systematic Literature Review

Real-time systems - scheduling, analysis, and verification

Real-Time Systems: Scheduling, Analysis, and Verification provides a substantial, up-to-date overview of the verification and validation process of real-time systems.

WOHA: Deadline-Aware Map-Reduce Workflow Scheduling Framework over Hadoop Clusters

WOHA is proposed, an efficient scheduling framework for deadline-aware Map-Reduce workflows that improves deadline satisfaction ratio by dynamically assigning priorities among workflows based on their progresses and scales up to tens of thousands of concurrently running workflows.

RAFTing MapReduce: Fast recovery on the RAFT

This paper proposes a family of Recovery Algorithms for Fast-Tracking (RAFT) MapReduce and implemented RAFT on top of Hadoop and evaluated it on a 45-node cluster using three common analytical tasks.