• Publications
  • Influence
WiscKey: Separating Keys from Values in SSD-conscious Storage
WiscKey is a persistent LSM-tree-based key-value store with a performance-oriented data layout that separates keys from values to minimize I/O amplification and is faster than both LevelDB and RocksDB in all six YCSB workloads. Expand
An analysis of data corruption in the storage stack
This article presents the first large-scale study of data corruption, which analyzes corruption instances recorded in production storage systems containing a total of 1.53 million disk drives, over a period of 41 months. Expand
Serverless Computation with OpenLambda
We present OpenLambda, a new, open-source platform for building next-generation web services and applications in the burgeoningmodel of serverless computation. We describe the key aspects ofExpand
Parallel programming in Split-C
The authors introduce the Split-C language, a parallel extension of C intended for high performance programming on distributed memory multiprocessors, and demonstrate the use of the language inExpand
The interaction of parallel and sequential workloads on a network of workstations
This paper examines the plausibility of using a network of workstations (NOW) for a mixture of parallel and sequential jobs, and presents a methodology for deriving an optimal delay time for recruiting idle machines for use by parallel programs. Expand
Slacker: Fast Distribution with Lazy Docker Containers
A new container benchmark is developed, HelloBench, to evaluate the startup times of 57 different containerized applications, and the design of Slacker, a new Docker storage driver optimized for fast container startup are guided. Expand
Scheduling with implicit information in distributed systems
This paper more rigorously analyze the two-phase spin-block algorithm and shows that spin time should be increased when a process is receiving messages, and shows how implicit coscheduling behaves under different job layouts and scaling. Expand
IRON file systems
It is shown that commodity file system failure policies are often inconsistent, sometimes buggy, and generally inadequate in their ability to recover from partial disk failures, so a new fail-partial failure model for disks is suggested, which incorporates realistic localized faults such as latent sector errors and block corruption. Expand
Analysis of HDFS under HBase: a facebook messages case study
It is examined how layering causes write amplication when HBase is run on top of HDFS and how tighter integration could result in improved write performance, and whether it makes sense to include an SSD to improve performance while keeping costs in check. Expand
Geiger: monitoring the buffer cache in a virtual machine environment
This paper creates a prototype implementation of techniques that can be used by a VMM to passively infer useful information about a guest operating system's unified buffer cache and virtual memory system, and implements a novel working set size estimator which allows the V MM to make more informed memory allocation decisions. Expand