Occupy the cloud: distributed computing for the 99%

@article{Jonas2017OccupyTC,
  title={Occupy the cloud: distributed computing for the 99\%},
  author={Eric Jonas and Qifan Pu and Shivaram Venkataraman and Ion Stoica and Benjamin Recht},
  journal={Proceedings of the 2017 Symposium on Cloud Computing},
  year={2017}
}
  • Eric Jonas, Qifan Pu, B. Recht
  • Published 13 February 2017
  • Computer Science
  • Proceedings of the 2017 Symposium on Cloud Computing
Distributed computing remains inaccessible to a large number of users, in spite of many open source platforms and extensive commercial offerings. While distributed computation frameworks have moved beyond a simple map-reduce model, many users are still left to struggle with complex cluster management and configuration tools, even for running simple embarrassingly parallel jobs. We argue that stateless functions represent a viable platform for these users, eliminating cluster management overhead… 

Figures and Tables from this paper

Towards Practical Serverless Analytics
TLDR
It is argued that cloud stateless functions represent a viable platform for users, eliminating cluster management overhead, fulfilling the promise of elasticity, and a system called Locus is developed, that can automate shuffle operations by judiciously provisioning hybrid intermediate storage.
SAND: Towards High-Performance Serverless Computing
TLDR
SAND is presented, a new serverless computing system that provides lower latency, better resource efficiency and more elasticity than existing serverless platforms, and introduces two key techniques: 1) application-level sandboxing, and 2) a hierarchical message bus.
Primula: a Practical Shuffle/Sort Operator for Serverless Computing
TLDR
The experience in designing Primula, a serverless sort operator that abstracts away users from the complexities of resource provisioning, skewed data and stragglers, yielding the most accessible sort primitive to date is reported.
Evaluation of Production Serverless Computing Environments
TLDR
This work claims that the current serverless computing environments can support dynamic applications in parallel when a partitioned task is executable on a small function instance and deploys a series of functions for distributed data processing to address the elasticity.
Data-driven serverless functions for object storage
TLDR
This paper presents an innovative data-driven serverless computing middleware for object storage that allows users to create small, stateless functions that intercept and operate on data flows in a scalable manner without the need to manage a server or a runtime environment.
Granular Computing and Network Intensive Applications: Friends or Foes?
TLDR
The architectural constraints as well as current serverless implementations are examined to develop a position on this topic and influence the next generation of computing services.
Centralized Core-granular Scheduling for Serverless Functions
TLDR
This paper argues for a cluster-level centralized and core-granular scheduler for serverless functions that eliminates queue imbalances while the core granularity reduces interference and is expected to increase the adoption of serverless computing platforms by various latency and throughput sensitive applications.
Kappa: a programming framework for serverless computing
TLDR
Kappa is proposed, a framework that simplifies serverless development that uses checkpointing to handle lambda function timeouts, and provides concurrency mechanisms that enable parallel computation and coordination.
Network Resource Isolation in Serverless Cloud Function Service
TLDR
It is insisted that network resource performance of functional execution models should be more visible and predictable, in order to expand the applications of serverless computing.
FaaSdom: a benchmark suite for serverless computing
TLDR
FaaSdom is a modular architecture and proof-of-concept implementation of a benchmark suite for serverless computing platforms that fully automatizes the deployment, execution and clean-up of such tests, providing insights on the performance observed by serverless applications.
...
...

References

SHOWING 1-10 OF 70 REFERENCES
Omega: flexible, scalable schedulers for large compute clusters
TLDR
This work presents a novel approach to address increasing scale and the need for rapid response to changing requirements using parallelism, shared state, and lock-free optimistic concurrency control to address monolithic cluster scheduler architectures.
Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center
TLDR
The results show that Mesos can achieve near-optimal data locality when sharing the cluster among diverse frameworks, can scale to 50,000 (emulated) nodes, and is resilient to failures.
Piccolo: Building Fast, Distributed Programs with Partitioned Tables
TLDR
Experiments show Piccolo to be faster than existing data flow models for many problems, while providing similar fault-tolerance guarantees and a convenient programming interface.
Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing
TLDR
Resilient Distributed Datasets is presented, a distributed memory abstraction that lets programmers perform in-memory computations on large clusters in a fault-tolerant manner and is implemented in a system called Spark, which is evaluated through a variety of user applications and benchmarks.
Quincy: fair scheduling for distributed computing clusters
TLDR
It is argued that data-intensive computation benefits from a fine-grain resource sharing model that differs from the coarser semi-static resource allocations implemented by most existing cluster computing architectures.
Disk-Locality in Datacenter Computing Considered Irrelevant
TLDR
Data center computing is becoming pervasive in many organizations, and a considerable work has been done to improve the efficiency of computing frameworks such as MapReduce, Hadoop and Dryad.
Sparrow: distributed, low latency scheduling
TLDR
It is demonstrated that a decentralized, randomized sampling approach provides near-optimal performance while avoiding the throughput and availability limitations of a centralized design.
Network support for resource disaggregation in next-generation datacenters
TLDR
The question of whether the resources within a server are disaggregated and the datacenter is instead architected as a collection of standalone resources is explored to determine whether the network can enable disaggregation atdatacenter scales.
SnowFlock: rapid virtual machine cloning for cloud computing
TLDR
SnowFlock provides sub-second VM cloning, scales to hundreds of workers, consumes few cloud I/O resources, and has negligible runtime overhead, and to evaluate SnowFlock, the implementation of the VM fork abstraction.
Network Requirements for Resource Disaggregation
TLDR
This paper uses a workload-driven approach to derive the minimum latency and bandwidth requirements that the network in disaggregated datacenters must provide to avoid degrading application-level performance and explores the feasibility of meeting these requirements with existing system designs and commodity networking technology.
...
...