Analysis of a Stochastic Model of Replication in Large Distributed Storage Systems

  title={Analysis of a Stochastic Model of Replication in Large Distributed Storage Systems},
  author={Wen Sun and V{\'e}ronique Simon and S{\'e}bastien Monnet and Philippe Robert and Pierre Sens},
  journal={Proceedings of the ACM on Measurement and Analysis of Computing Systems},
  pages={1 - 21}
  • Wen SunVéronique Simon P. Sens
  • Published 2 January 2017
  • Computer Science
  • Proceedings of the ACM on Measurement and Analysis of Computing Systems
Distributed storage systems such as Hadoop File System or Google File System (GFS) ensure data availability and durability using replication. Persistence is achieved by replicating the same data block on several nodes, and ensuring that a minimum number of copies are available on the system at any time. Whenever the contents of a node are lost, for instance due to a hard disk crash, the system regenerates the data blocks stored before the failure by transferring them from the remaining replicas… 

Figures and Tables from this paper

Toward Efficient Block Replication Management in Distributed Storage

An adaptive scheme supporting dynamic block data replication and an efficient replica placement policy to improve the I/O performance of a distributed file system and demonstrates that the proposed approach can boost the usage efficiency of the data replicas with acceptable overhead of data replication management.

CAnDoR: Consistency Aware Dynamic data Replication

CAnDoR is proposed, an approach that dynamically adapts the replication according to the data usage (read/write frequencies and locations) and the consistency protocol used to manage the piece of data.

Ju l 2 01 9 Decentralized utility-and locality-aware replication for heterogeneous DHT-based P 2 P cloud storage systems

This paper proposes Pyramid, which is the first fully decentralized utilityand locality-aware replication approach for Skip Graph-based P2P cloud storage systems, and improves both the utility and locality-awareness of replicas with a gain of about 1.2 and 1.1 times at the same time.



Scattering and Placing Data Replicas to Enhance Long-Term Durability

This paper proposes an approach that provides the ability to finely tune the proportion of common content stored by the nodes, and to control the storage load distribution while creating new data block copies, and proposes a simulation model that allows this ability.

Cassandra: a decentralized structured storage system

Cassandra is a distributed storage system for managing very large amounts of structured data spread out across many commodity servers, while providing highly available service with no single point of

Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility

The storage management and caching in PAST, a large-scale peer-to-peer persistent storage utility based on a self-organizing, Internet-based overlay network of storage nodes that cooperatively route file queries, store multiple replicas of files, and cache additional copies of popular files, is evaluated.

Dynamo: amazon's highly available key-value store

D Dynamo is presented, a highly available key-value storage system that some of Amazon's core services use to provide an "always-on" experience and makes extensive use of object versioning and application-assisted conflict resolution in a manner that provides a novel interface for developers to use.

Designing a DHT for Low Latency and High Throughput

New techniques that resulted from this exploration include use of latency predictions based on synthetic co-ordinates, efficient integration of lookup routing and data fetching, and a congestion control mechanism suitable for fetching data striped over large numbers of servers.

Failure Trends in a Large Disk Drive Population

It is found that temperature and activity levels were much less correlated with drive failures than previously reported, and models based on SMART parameters alone are unlikely to be useful for predicting individual drive failures.

Chaoticity on path space for a queueing network with selection of the shortest queue among several

  • C. Graham
  • Mathematics
    Journal of Applied Probability
  • 2000
We consider a network with N infinite-buffer queues with service rates λ, and global task arrival rate Nν. Each task is allocated L queues among N with uniform probability and joins the least loaded

When Multi-hop Peer-to-Peer Lookup Matters

It is concluded that the multi-hop optimizations make sense only for truly vast and very dynamic peer networks, and that resource trends indicate this scale is on the rise.

The power of two random choices: a survey of tech-niques and results

The important implication of this result is that even a small amount of choice can lead to drastically di erent results in load balancing.

Poisson Processes

  • M. R. Leadbetter
  • Mathematics
    International Encyclopedia of Statistical Science
  • 2011
Consider a Random Process that models the occurrence and evolution of events in time and the random variable N(t) which represents the largest n, such that Sn( t) ≤ t.