# Load Balancing Performance in Distributed Storage with Regular Balanced Redundancy

@article{Akta2019LoadBP, title={Load Balancing Performance in Distributed Storage with Regular Balanced Redundancy}, author={Mehmet Fatih Aktaş and Amir Behrouzi-Far and Emina Soljanin and Philip A. Whiting}, journal={2019 XVI International Symposium "Problems of Redundancy in Information and Control Systems" (REDUNDANCY)}, year={2019}, pages={75-80} }

Contention at the storage nodes is the main cause of long and variable data access times in distributed storage systems. Offered load on the system must be balanced across the storage nodes in order to minimize contention, and load balancing should be robust against the skews and fluctuations in content popularities. Data objects are replicated across multiple nodes in practice to allow for load balancing. However redundancy increases the storage requirement and should be used efficiently. We…

## 7 Citations

Evaluating Load Balancing Performance in Distributed Storage With Redundancy

- Computer Science, MathematicsIEEE Transactions on Information Theory
- 2021

The load balance in a system of nodes in which each object is stored at different nodes improves multiplicatively with <inline-formula> <tex-math notation="LaTeX">$d$ </tex-Math></inline- formula> as long as the spacing between consecutive spacings is consecutive between the ordered statistics of uniform random variables.

Data Freshness in Leader-Based Replicated Storage

- Computer Science, Mathematics2020 IEEE International Symposium on Information Theory (ISIT)
- 2020

It is shown that, depending on the relative speed of the write operation to the two groups of nodes, there exists an optimal number of leaders which minimizes the average age of the retrieved data, and that this number increases as the Relative speed of writing on leaders increases.

Distributed Multi-User Secret Sharing

- Computer ScienceIEEE Transactions on Information Theory
- 2021

It is shown how to modify the proposed protocols in order to construct schemes with balanced storage load and communication complexity, thereby demonstrating schemes that are optimal in terms of both parameters.

A Geometric View of the Service Rates of Codes Problem and its Application to the Service Rate of the First Order Reed-Muller Codes

- Computer Science, Mathematics2020 IEEE International Symposium on Information Theory (ISIT)
- 2020

This work derives upper bounds on the service rates of the first order Reed-Muller codes and the simplex codes and shows that given the service rate region of a code, a lower bound on the minimum distance of the code can be obtained.

A Combinatorial View of the Service Rates of Codes Problem, its Equivalence to Fractional Matching and its Connection with Batch Codes

- Computer Science, Mathematics2020 IEEE International Symposium on Information Theory (ISIT)
- 2020

It is shown that the service capacity of a coded storage system equals the fractional matching number in the graph representation of the code, and thus is lower bounded and upper bounded by the matching number and the vertex cover number, respectively.

Batch Codes for Asynchronous Recovery of Data

- MathematicsIEEE Transactions on Information Theory
- 2021

We propose a new model of asynchronous batch codes that allow for parallel recovery of information symbols from a coded database in an asynchronous manner, i.e. when queries arrive at random times…

Service Rate Region: A New Aspect of Coded Distributed System Design

- Computer Science, MathematicsIEEE Transactions on Information Theory
- 2021

This work shows that erasure coding of data objects can flexibly handle skews in the request rates, and shows the effectiveness of hybrid codes that combine replication and erasures coding in terms of code design.

## References

SHOWING 1-10 OF 39 REFERENCES

Memory allocation in distributed storage networks

- Computer Science, Mathematics2010 IEEE International Symposium on Information Theory
- 2010

This work considers the problem of distributing a file in a network of storage nodes whose storage budget is limited but at least equals the size file and finds the optimal symmetric allocation for all coding redundancy constraints using the equivalent approximate problem.

Cassandra: a decentralized structured storage system

- Computer ScienceOPSR
- 2010

Cassandra is a distributed storage system for managing very large amounts of structured data spread out across many commodity servers, while providing highly available service with no single point of…

On the service capacity region of accessing erasure coded content

- Computer Science, Mathematics2017 55th Annual Allerton Conference on Communication, Control, and Computing (Allerton)
- 2017

This analysis demonstrates that erasure coding makes the system more robust to skews in file popularity than simply replicating a file at multiple servers, and that coding and replication together can make the capacity region larger than either alone.

Service Rate Region of Content Access from Erasure Coded Storage

- Computer Science, Mathematics2018 IEEE Information Theory Workshop (ITW)
- 2018

This paper determines the set of request arrival rates for the a 3-file coded storage system and provides an algorithm to maximize the rate of requests served for file $K$ given $\ lambda _{1}$,..., $\lambda _{K- 1}$ in a general K-file case.

Scarlett: coping with skewed content popularity in mapreduce clusters

- Computer ScienceEuroSys '11
- 2011

Scarlett, a system that replicates blocks based on their popularity by accurately predicting file popularity and working within hard bounds on additional storage, causes minimal interference to running jobs.

The Hadoop Distributed File System

- Computer Science2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST)
- 2010

The architecture of HDFS is described and experience using HDFS to manage 25 petabytes of enterprise data at Yahoo! is reported on.

Balanced allocations: the heavily loaded case

- Computer ScienceSTOC '00
- 2000

It is shown that the multiplechoice processes are fundamentally different from the singlechoice variant in that they have "short memory" and the deviation of the multiple-choice processes from the optimal allocation does not increase with the number of balls as in case of the single-choice process.

Redundancy Scheduling in Systems with Bi-Modal Job Service Time Distributions

- Computer Science, Mathematics2019 57th Annual Allerton Conference on Communication, Control, and Computing (Allerton)
- 2019

This work develops an analogy to a classical urns and balls problem, and uses it to study the queuing time performance of two non-adaptive classical scheduling policies: random and round-robin.

An analysis of Facebook photo caching

- Computer ScienceSOSP
- 2013

This paper instrumented every Facebook-controlled layer of the stack and sampled the resulting event stream to obtain traces covering over 77 million requests for more than 1 million unique photos to study traffic patterns, cache access patterns, geolocation of clients and servers, and to explore correlation between properties of the content and accesses.

Balls and bins with structure: balanced allocations on hypergraphs

- Mathematics, Computer ScienceSODA '08
- 2008

This paper allows each ball to have an associated random set of bins and shows that this model captures structure important to two applications, nearby server selection and load balance in distributed hash tables.