Managing data transfers in computer clusters with orchestra


Cluster computing applications like MapReduce and Dryad transfer massive amounts of data between their computation stages. These transfers can have a significant impact on job performance, accounting for more than 50% of job completion times. Despite this impact, there has been relatively little work on optimizing the performance of these data transfers, with networking researchers traditionally focusing on per-flow traffic management. We address this limitation by proposing a global management architecture and a set of algorithms that (1) improve the transfer times of common communication patterns, such as broadcast and shuffle, and (2) allow scheduling policies at the transfer level, such as prioritizing a transfer over other transfers. Using a prototype implementation, we show that our solution improves broadcast completion times by up to 4.5X compared to the status quo in Hadoop. We also show that transfer-level scheduling can reduce the completion time of high-priority transfers by 1.7X.

DOI: 10.1145/2018436.2018448

Extracted Key Phrases

19 Figures and Tables

Citations per Year

351 Citations

Semantic Scholar estimates that this publication has 351 citations based on the available data.

See our FAQ for additional information.

Cite this paper

@inproceedings{Chowdhury2011ManagingDT, title={Managing data transfers in computer clusters with orchestra}, author={Mosharaf Chowdhury and Matei Zaharia and Justin Ma and Michael I. Jordan and Ion Stoica}, booktitle={SIGCOMM}, year={2011} }