Windows Azure Storage: a highly available cloud storage service with strong consistency

  title={Windows Azure Storage: a highly available cloud storage service with strong consistency},
  author={Brad Calder and Ju Wang and Aaron Ogus and Niranjan Nilakantan and Arild Skjolsvold and Sam McKelvie and Yikang Xu and Shashwat Srivastav and Jiesheng Wu and Huseyin Simitci and Jaidev Haridas and Chakravarthy Uddaraju and Hemal Khatri and Andrew Edwards and Vaman Bedekar and Shane Mainali and Rafay Abbasi and Arpit Agarwal and Mian Fahim ul Haq and Muhammad Inaam Ul Haq and Deepali Bhardwaj and Sowmya Dayanand and Anitha Adusumilli and Marvin McNett and Sriram Sankaran and Kavitha Manivannan and Leonidas Rigas},
  journal={Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles},
  • B. Calder, Ju Wang, Leonidas Rigas
  • Published 23 October 2011
  • Computer Science
  • Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles
Windows Azure Storage (WAS) is a cloud storage system that provides customers the ability to store seemingly limitless amounts of data for any duration of time. WAS customers have access to their data from anywhere at any time and only pay for what they use and store. In WAS, data is stored durably using both local and geographic replication to facilitate disaster recovery. Currently, WAS storage comes in the form of Blobs (files), Tables (structured storage), and Queues (message delivery). In… 

Figures and Tables from this paper

Data Storage Management in Cloud Environments

This article provides a comprehensive taxonomy that covers key aspects of cloud-based data store: data model, data dispersion, data consistency, data transaction service, and data management cost.

Consistency-based service level agreements for cloud storage

Evaluations running on a worldwide test bed with geo-replicated data show that the Pileus system adapts to varying client-server latencies to provide service that matches or exceeds the best static consistency choice and server selection scheme.

Isolation in cloud storage

This work pushes consistency information down the stack by associating versions within the multi-version store with applicationlevel timestamps; conversely, it pushes performance information up the Stack by allowing applications to query the estimated cost of issuing a read operation.

Azure Data Lake Store: A Hyperscale Distributed File Service for Big Data Analytics

An overview of ADLS architecture, design points, and performance is presented, which includes its design for handling multiple storage tiers, exabyte scale, and comprehensive security and data sharing features.

On limitations of using cloud storage for data replication

This paper uses the consensus number of a shared storage abstraction as a measure for its power to facilitate the implementation of data replication, and demonstrates that a KVS is a very simple primitive, not different from read/write registers in this sense, and that a replica capable of the typical operations on timestamped data is fundamentally more powerful than aKVS.

Liquid Cloud Storage

This work shows that a liquid system can be operated to enable flexible and essentially optimal combinations of storage durability, storage overhead, repair bandwidth usage, and access performance.

Distributed storage evaluation on a three-wide inter-data center deployment

Three popular distributed object stores, namely Quantcast-QFS, Swift and Tahoe-LAFS, are considered and tested in a three-wide data center environment and the findings are reported.

FSaaS: File System as a Service

  • Dapeng DongJ. Herbert
  • Computer Science
    2014 IEEE 38th Annual Computer Software and Applications Conference
  • 2014
The evaluation demonstrates thatFSaaS can significantly improve user experience, boost operation performance, and reduce object operational delay, and the scalability and flexibility of the FSaaS are also demonstrated via asynchronous operation mode.

Efficient Batched Synchronization in Dropbox-Like Cloud Storage Services

This work proposes the update-batched delayed synchronization (UDS) mechanism, which acts as a middleware between the user’s file storage system and a cloud storage application to significantly reduce the overhead caused by session maintenance traffic, while preserving the rapid file synchronization that users expect from cloud storage services.

Architecture of a distributed storage that combines file system, memory and computation in a single layer

A single system called Pangea is proposed that can manage all data—both intermediate and long-lived data, and their buffer/caching, page replacement, data placement optimization, and failure recovery—all in one monolithic distributed storage system, without any layering.



Cassandra: a decentralized structured storage system

Cassandra is a distributed storage system for managing very large amounts of structured data spread out across many commodity servers, while providing highly available service with no single point of

Megastore: Providing Scalable, Highly Available Storage for Interactive Services

Megastore provides fully serializable ACID semantics within ne-grained partitions of data, which allows us to synchronously replicate each write across a wide area network with reasonable latency and support seamless failover between datacenters.

PNUTS: Yahoo!'s hosted data serving platform

PNUTS provides data storage organized as hashed or ordered tables, low latency for large numbers of concurrent requests including updates and queries, and novel per-record consistency guarantees and utilizes automated load-balancing and failover to reduce operational complexity.

Bigtable: A Distributed Storage System for Structured Data

The simple data model provided by Bigtable is described, which gives clients dynamic control over data layout and format, and the design and implementation of Bigtable are described.

Dynamo: amazon's highly available key-value store

D Dynamo is presented, a highly available key-value storage system that some of Amazon's core services use to provide an "always-on" experience and makes extensive use of object versioning and application-assisted conflict resolution in a manner that provides a novel interface for developers to use.

Object Storage on CRAQ: High-Throughput Chain Replication for Read-Mostly Workloads

Additional design and implementation considerations for geo-replicated CRAQ storage across multiple datacenters to provide locality-optimized operations are explored and multi-object atomic updates and multicast optimizations for large-object updates are discussed.

VL2: a scalable and flexible data center network

VL2 is a practical network architecture that scales to support huge data centers with uniform high capacity between servers, performance isolation between services, and Ethernet layer-2 semantics, and is built on a working prototype.

The Chubby lock service for loosely-coupled distributed systems

The paper describes the initial design and expected use, compares it with actual use, and explains how the design had to be modified to accommodate the differences.

SnapMirror: File-System-Based Asynchronous Mirroring for Disaster Recovery

Snap-Mirror is presented, an asynchronous mirroring technology that leverages file system snapshots to ensure the consistency of the remote mirror and optimize data transfer, and exploiting file system knowledge of deletions is critical to achieving any reduction for no-overwrite file systems such as WAFL and LFS.

Chain Replication for Supporting High Throughput and Availability

Besides outlining the chain replication protocols themselves, simulation experiments explore the performance characteristics of a prototype implementation and several object-placement strategies (including schemes based on distributed hash table routing) are discussed.