Not All Joules are Equal: Towards Energy-Efficient and Green-Aware Data Processing Frameworks
Data-intensive computing (DISC) frameworks scale by partitioning a <i>job</i> across a set of fault-tolerant <i>tasks</i>, then diffusing those tasks across large clusters. Multi-tenanted clusters must accommodate service-level objectives (SLO) in their resource model, often expressed as a maximum latency for allocating the desired set of resources to every job. When jobs are partitioned into tasks statically, a cluster cannot meet its SLOs while maintaining both high utilization and efficiency. Ideally, we want to give resources to jobs when they are free but would expect to reclaim them instantaneously when new jobs arrive, <i>without</i> losing work. DISC frameworks do not support such <i>elasticity</i> because interrupting running tasks incurs high overheads. Amoeba enables lightweight elasticity in DISC frameworks by identifying points at which running tasks of over-provisioned jobs can be safely exited, committing their outputs, and spawning new tasks for the remaining work. Effectively, tasks of DISC jobs are now sized dynamically in response to global resource scarcity or abundance. Simulation and deployment of our prototype shows that Amoeba speeds up jobs by 32% without compromising utilization or efficiency.
Unfortunately, ACM prohibits us from displaying non-influential references for this paper.
To see the full reference list, please visit http://dl.acm.org/citation.cfm?id=2391253.