• Publications
  • Influence
Benchmarking cloud serving systems with YCSB
This work presents the "Yahoo! Cloud Serving Benchmark" (YCSB) framework, with the goal of facilitating performance comparisons of the new generation of cloud data serving systems, and defines a core set of benchmarks and reports results for four widely used systems. Expand
PNUTS: Yahoo!'s hosted data serving platform
PNUTS provides data storage organized as hashed or ordered tables, low latency for large numbers of concurrent requests including updates and queries, and novel per-record consistency guarantees and utilizes automated load-balancing and failover to reduce operational complexity. Expand
Spanner: Google's globally-distributed database
The design and implementation of Spanner is discussed, as well as some of the lessons it has learned along the way, and some open challenges in building scalable distributed storage systems are discussed. Expand
A Fast Index for Semistructured Data
The Index Fabric is described, an indexing structure that provides the efficiency and flexibility needed to optimize ad hoc queries over semistructured data, and how "refined paths" optimize specific access paths. Expand
Feeding frenzy: selectively materializing users' event feeds
This work associates feeds with consumers and event streams with producers, and demonstrates that the best performance results from selectively materializing each consumer's feed: events from high-rate producers are retrieved at query time, while events from lower- rate producers are materialized in advance. Expand
Beautiful Data: The Stories Behind Elegant Data Solutions
An Optimal Overlay Topology for Routing Peer-to-Peer Searches
The square-root topology is introduced, and it is shown that this topology significantly improves routing performance compared to power-law networks and other topology types. Expand
Performance of Full Text Search in Structured and Unstructured Peer-to-Peer Systems
A quantitative comparison of full text keyword search in structured and unstructured P2P systems shows that the structured network provides the best response time, but has a high cost of document publishing, using six times as much bandwidth as the super-peer system. Expand
Spanner: Becoming a SQL System
The database DNA of Spanner is highlighted, including distributed query execution in the presence of resharding, query restarts upon transient failures, range extraction that drives query routing and index seeks, and the improved blockwise-columnar storage format. Expand
Ad Hoc, self-supervising peer-to-peer search networks
Simulation results indicate that the ad hoc networks formed using the described techniques are more efficient than popular supernode topologies for several important scenarios. Expand