• Publications
  • Influence
Chord: A scalable peer-to-peer lookup service for internet applications
Results from theoretical analysis, simulations, and experiments show that Chord is scalable, with communication cost and the state maintained by each node scaling logarithmically with the number of Chord nodes. Expand
Spark: Cluster Computing with Working Sets
Spark can outperform Hadoop by 10x in iterative machine learning jobs, and can be used to interactively query a 39 GB dataset with sub-second response time. Expand
Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing
Resilient Distributed Datasets is presented, a distributed memory abstraction that lets programmers perform in-memory computations on large clusters in a fault-tolerant manner and is implemented in a system called Spark, which is evaluated through a variety of user applications and benchmarks. Expand
A view of cloud computing
Clearing the clouds away from the true potential and obstacles posed by this computing capability.
Above the Clouds: A Berkeley View of Cloud Computing
This work focuses on SaaS Providers (Cloud Users) and Cloud Providers, which have received less attention than SAAS Users, and uses the term Private Cloud to refer to internal datacenters of a business or other organization, not made available to the general public. Expand
Chord: a scalable peer-to-peer lookup protocol for internet applications
Results from theoretical analysis and simulations show that Chord is scalable: Communication cost and the state maintained by each node scale logarithmically with the number of Chord nodes. Expand
Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center
The results show that Mesos can achieve near-optimal data locality when sharing the cluster among diverse frameworks, can scale to 50,000 (emulated) nodes, and is resilient to failures. Expand
Dominant Resource Fairness: Fair Allocation of Multiple Resource Types
Dominant Resource Fairness (DRF), a generalization of max-min fairness to multiple resource types, is proposed, and it is shown that it leads to better throughput and fairness than the slot-based fair sharing schemes in current cluster schedulers. Expand
Wide-area cooperative storage with CFS
The Cooperative File System (CFS) is a new peer-to-peer read-only storage system that provides provable guarantees for the efficiency, robustness, and load-balance of file storage and retrieval. CFSExpand
GraphX: Graph Processing in a Distributed Dataflow Framework
This paper introduces GraphX, an embedded graph processing framework built on top of Apache Spark, a widely used distributed dataflow system and demonstrates that GraphX achieves an order of magnitude performance gain over the base dataflow framework and matches the performance of specialized graph processing systems while enabling a wider range of computation. Expand