Themis: an I/O-efficient MapReduce


"Big Data" computing increasingly utilizes the MapReduce programming model for scalable processing of large data collections. Many MapReduce jobs are I/O-bound, and so minimizing the number of I/O operations is critical to improving their performance. In this work, we present Themis, a MapReduce implementation that reads and writes data records to disk exactly twice, which is the minimum amount possible for data sets that cannot fit in memory. In order to minimize I/O, Themis makes fundamentally different design decisions from previous MapReduce implementations. Themis performs a wide variety of MapReduce jobs -- including click log analysis, DNA read sequence alignment, and PageRank -- at nearly the speed of TritonSort's record-setting sort performance [29].

DOI: 10.1145/2391229.2391242

Extracted Key Phrases

17 Figures and Tables

Citations per Year

52 Citations

Semantic Scholar estimates that this publication has 52 citations based on the available data.

See our FAQ for additional information.

Cite this paper

@inproceedings{Rasmussen2012ThemisAI, title={Themis: an I/O-efficient MapReduce}, author={Alexander Rasmussen and Vinh The Lam and Michael Conley and George Porter and Rishi Kapoor and Amin Vahdat}, booktitle={SoCC}, year={2012} }