The HiBench benchmark suite: Characterization of the MapReduce-based data analysis

Abstract

The MapReduce model is becoming prominent for the large-scale data analysis in the cloud. In this paper, we present the benchmarking, evaluation and characterization of Hadoop, an open-source implementation of MapReduce. We first introduce HiBench, a new benchmark suite for Hadoop. It consists of a set of Hadoop programs, including both synthetic micro-benchmarks and real-world Hadoop applications. We then evaluate and characterize the Hadoop framework using HiBench, in terms of speed (i.e., job running time), throughput (i.e., the number of tasks completed per minute), HDFS bandwidth, system resource (e.g., CPU, memory and I/O) utilizations, and data access patterns.

DOI: 10.1109/ICDEW.2010.5452747

Extracted Key Phrases

16 Figures and Tables

05010020102011201220132014201520162017
Citations per Year

366 Citations

Semantic Scholar estimates that this publication has 366 citations based on the available data.

See our FAQ for additional information.

Cite this paper

@article{Huang2010TheHB, title={The HiBench benchmark suite: Characterization of the MapReduce-based data analysis}, author={Shengsheng Huang and Jie Huang and Jinquan Dai and Tao Xie and Bo Huang}, journal={2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010)}, year={2010}, pages={41-51} }