Dynamo: amazon's highly available key-value store

  title={Dynamo: amazon's highly available key-value store},
  author={Giuseppe deCandia and Deniz Hastorun and Madan Mohan Rao Jampani and Gunavardhan Kakulapati and Avinash Lakshman and Alex Pilchin and Swaminathan Sivasubramanian and Peter Vosshall and Werner Vogels},
  booktitle={Symposium on Operating Systems Principles},
Reliability at massive scale is one of the biggest challenges we face at Amazon.com, one of the largest e-commerce operations in the world; even the slightest outage has significant financial consequences and impacts customer trust. The Amazon.com platform, which provides services for many web sites worldwide, is implemented on top of an infrastructure of tens of thousands of servers and network components located in many datacenters around the world. At this scale, small and large components… 

Figures and Tables from this paper

Amazon DynamoDB: A Scalable, Predictably Performant, and Fully Managed NoSQL Database Service

The experience operating DynamoDB at a massive scale is presented and how the architecture continues to evolve to meet the ever-increasing demands of customer workloads is presented.

Millions of Tiny Databases

Physalia is a transactional keyvalue store, optimized for use in large-scale cloud control planes, which takes advantage of knowledge of transaction patterns and infrastructure design to offer both high availability and strong consistency to millions of clients.

Exploring the design space of highly-available distributed transactions

A three-way trade-off between read isolation, delay (latency), and data freshness is found and demonstrated, and two isolation properties are proposed: TCC- and PSI-.

Scalable and elastic transactional data stores for cloud computing platforms

This dissertation shows that with careful choice of design and features, it is possible to architect scalable DBMSs that efficiently support transactional semantics to ease application design and elastically adapt to fluctuating operational demands to optimize the operating cost.

Performance Sensitive Replication in Geo-distributed Cloud Datastores

This paper presents models that optimize percentiles of response time under normal operation and under a data-center (DC) failure in quorum-based cloud storage systems, and evaluates their models using real-world traces of Twitter, Wikipedia and Go Walla on a Cassandra cluster deployed in Amazon EC2.

ElasTraS: An Elastic Transactional Data Store in the Cloud

This paper proposes ElasTraS which addresses the issue of scalability and elasticity of the data store in a cloud computing environment to leverage from the elastic nature of the underlying infrastructure, while providing scalable transactional data access.

Efficient and low-cost fault tolerance for web-scale systems

This thesis proposes a novel algorithm, called Scrooge, which reduces the replication costs of fast BFT replication in presence of unresponsive replicas, and shows the existence of an inherent tradeoff between optimal redundancy and minimal latency in Presence of faulty replicas.

Materialized views in Cassandra

This paper presents an efficient implementation of materialized views in key-value stores that enables complex query processing and is tailored for efficient maintenance.

Associate Adaptable TransactionalInformation Store in the Cloud UsingDistributed Storage and Meta Data Manager

This paper aims at providing the planning of a system current; highlight the major style selections, analysing the various guarantees provided by the system, and distinguishing several vital challenges for the analysis community striving for computing within the cloud.

Adaptive Query Scheduling in Key-Value Data Stores

This work proposes the AFIT scheduling strategy, which allows for selective data refreshing and integrates the benefits of SJF-based scheduling with an EDF-like policy, and does not only strike a fine trade-off between QoS and QoD but also automatically adapts to workload settings.



Cluster-based scalable network services

A general, layered architecture for building cluster-based scalable network services that encapsulates the above requirements for reuse, and a service-programming model based on composable workers that perform transformation, aggregation, caching, and customization (TACC) of Internet content is proposed.

FAB: building distributed enterprise disk arrays from commodity components

It is argued that voting is practical and necessary for reliable, high-throughput storage systems such as FAB, a distributed disk array that provides the reliability of traditional enterprise arrays with lower cost and better scalability.

Beehive: O(1) Lookup Performance for Power-Law Query Distributions in Peer-to-Peer Overlays

A proactive replication framework that can provide constant lookup performance for common Zipf-like query distributions and can realistically achieve good latencies, outperform passive caching, and adapt efficiently to sudden changes in object popularity, also known as flash crowds.

SEDA: an architecture for well-conditioned, scalable internet services

This work presents the SEDA design and an implementation of an Internet services platform based on this architecture, and describes several control mechanisms for automatic tuning and load conditioning, including thread pool sizing, event batching, and adaptive load shedding.

OceanStore: an architecture for global-scale persistent storage

OceanStore monitoring of usage patterns allows adaptation to regional outages and denial of service attacks; monitoring also enhances performance through pro-active movement of data.

Chord: A scalable peer-to-peer lookup service for internet applications

Results from theoretical analysis, simulations, and experiments show that Chord is scalable, with communication cost and the state maintained by each node scaling logarithmically with the number of Chord nodes.

Antiquity: exploiting a secure log for wide-area distributed storage

Antiquity uses a secure log to maintain data integrity, replicates each log on multiple servers for durability, and uses dynamic Byzantine fault-tolerant quorum protocols to ensure consistency among replicas.

Farsite: federated, available, and reliable storage for an incompletely trusted environment

The design of Farsite is reported on and the lessons learned by implementing much of that design are reported, including how to locally caching file data, lazily propagating file updates, and varying the duration and granularity of content leases.

Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility

The storage management and caching in PAST, a large-scale peer-to-peer persistent storage utility based on a self-organizing, Internet-based overlay network of storage nodes that cooperatively route file queries, store multiple replicas of files, and cache additional copies of popular files, is evaluated.

Managing update conflicts in Bayou, a weakly connected replicated storage system

The motivation for and design of these mechanisms for conflict detection and per -write conflict resolution based on client-provid ed procedures are presented and the experiences gained with an initial implementation of the system are described.