CIRCUMFLEX: a scheduling optimizer for MapReduce workloads with shared scans

Abstract

We consider MapReduce clusters designed to support multiple concurrent jobs, concentrating on environments in which the number of distinct datasets is modest relative to the number of jobs. Many datasets in such scenarios wind up being scanned by multiple concurrent Map phase jobs. As has been noticed previously, this scenario provides an opportunity for… (More)
DOI: 10.1145/2146382.2146388

3 Figures and Tables

Topics

  • Presentations referencing similar topics