• Publications
  • Influence
Benchmarking cloud serving systems with YCSB
TLDR
This work presents the "Yahoo! Cloud Serving Benchmark" (YCSB) framework, with the goal of facilitating performance comparisons of the new generation of cloud data serving systems, and defines a core set of benchmarks and reports results for four widely used systems. Expand
MapReduce Online
TLDR
A modified version of the Hadoop MapReduce framework that supports online aggregation, which allows users to see "early returns" from a job as it is being computed, and can reduce completion times and improve system utilization for batch jobs as well. Expand
Can machine learning be secure?
TLDR
A taxonomy of different types of attacks on machine learning techniques and systems, a variety of defenses against those attacks, and an analytical model giving a lower bound on attacker's work function are provided. Expand
bLSM: a general purpose log structured merge tree
Data management workloads are increasingly write-intensive and subject to strict latency SLAs. This presents a dilemma: Update in place systems have unmatched latency but poor write throughput. InExpand
Dedalus: Datalog in Time and Space
TLDR
Dedalus is presented, a foundation language for programming and reasoning about distributed systems that reduces to a subset of Datalog with negation, aggregate functions, successor and choice, and adds an explicit notion of logical time to the language. Expand
Online aggregation and continuous query support in MapReduce
TLDR
A modified version of the Hadoop MapReduce framework that supports online aggregation, which allows users to see "early returns" from a job as it is being computed, and can reduce completion times and improve system utilization for batch jobs as well. Expand
Stasis: flexible transactional storage
TLDR
It is argued there is a gap between DBMSs and file systems that limits designers of data-oriented applications and Stasis is a storage framework that incorporates ideas from traditional write-ahead logging algorithms and file system that provides applications with flexible control over data structures, data layout, robustness, and performance. Expand
Boom analytics: exploring data-centric, declarative programming for the cloud
TLDR
This paper used the Overlog language to implement a "Big Data" analytics stack that is API-compatible with Hadoop and HDFS and provides comparable performance and presents both quantitative and anecdotal results, providing some concrete evidence that both data-centric design and declarative languages can substantially simplify distributed systems programming. Expand
Walnut: a unified cloud object store
TLDR
The motivation for unifying different storage clouds is discussed, the requirements of a common storage layer are described, and the Walnut design is presented, which uses a quorum-based replication protocol and one-hop direct client access to the data in most regular operations. Expand
To BLOB or Not To BLOB: Large Object Storage in a Database or a Filesystem?
TLDR
The notion of "storage age" or number of object overwrites as way of normalizing wall clock time allows the results or similar such results to be applied across a number of read:write ratios and object replacement rates. Expand
...
1
2
3
...