True elasticity in multi-tenant data-intensive compute clusters

Abstract

Data-intensive computing (DISC) frameworks scale by partitioning a <i>job</i> across a set of fault-tolerant <i>tasks</i>, then diffusing those tasks across large clusters. Multi-tenanted clusters must accommodate service-level objectives (SLO) in their resource model, often expressed as a maximum latency for allocating the desired set of resources to every job. When jobs are partitioned into tasks statically, a cluster cannot meet its SLOs while maintaining both high utilization and efficiency. Ideally, we want to give resources to jobs when they are free but would expect to reclaim them instantaneously when new jobs arrive, <i>without</i> losing work. DISC frameworks do not support such <i>elasticity</i> because interrupting running tasks incurs high overheads. Amoeba enables lightweight elasticity in DISC frameworks by identifying points at which running tasks of over-provisioned jobs can be safely exited, committing their outputs, and spawning new tasks for the remaining work. Effectively, tasks of DISC jobs are now sized dynamically in response to global resource scarcity or abundance. Simulation and deployment of our prototype shows that Amoeba speeds up jobs by 32% without compromising utilization or efficiency.

DOI: 10.1145/2391229.2391253

Extracted Key Phrases

5 Figures and Tables

01020201520162017
Citations per Year

Citation Velocity: 9

Averaging 9 citations per year over the last 3 years.

Learn more about how we calculate this metric in our FAQ.

Cite this paper

@inproceedings{Ananthanarayanan2012TrueEI, title={True elasticity in multi-tenant data-intensive compute clusters}, author={Ganesh Ananthanarayanan and Chris Douglas and Raghu Ramakrishnan and Sriram Rao and Ion Stoica}, booktitle={SoCC}, year={2012} }