The HiBench benchmark suite: Characterization of the MapReduce-based data analysis

Abstract

The MapReduce model is becoming prominent for the large-scale data analysis in the cloud. In this paper, we present the benchmarking, evaluation and characterization of Hadoop, an open-source implementation of MapReduce. We first introduce HiBench, a new benchmark suite for Hadoop. It consists of a set of Hadoop programs, including both synthetic micro-benchmarks and real-world Hadoop applications. We then evaluate and characterize the Hadoop framework using HiBench, in terms of speed (i.e., job running time), throughput (i.e., the number of tasks completed per minute), HDFS bandwidth, system resource (e.g., CPU, memory and I/O) utilizations, and data access patterns.

DOI: 10.1109/ICDEW.2010.5452747

Extracted Key Phrases

26 Figures and Tables

Showing 1-10 of 18 references

Available: http://issues.apache.org/jira/browse

  • Hadoop
Highly Influential
6 Excerpts

WordCount program Available in Hadoop source distribution: src/examples/org/apache/hadoop/ examples

Highly Influential
14 Excerpts

Available in Hadoop source distribution since 0.19 version: src/examples/org/apache/hadoop/examples

  • Hadoop Terasort, Program
Highly Influential
6 Excerpts

http://sortbenchmark.org

  • Terasort
Highly Influential
6 Excerpts

A Comparison of Approaches to Large- Scale Data Analysis

  • A Pavlo, A Rasin, +5 authors D J Abadi
  • 2009
1 Excerpt

Optimizing Hadoop Deployments

  • Nurcan Coskun
  • 2009

Winning a 60 Second Dash with a Yellow Elephant Available: http://sortbenchmark.org/Yahoo2009.pdf [18] " Sorting 1PB with MapReduce

  • O O Malley, A C Murthy
  • 2008
1 Excerpt
Showing 1-10 of 221 extracted citations
05010020102011201220132014201520162017
Citations per Year

319 Citations

Semantic Scholar estimates that this publication has received between 264 and 391 citations based on the available data.

See our FAQ for additional information.