XORing Elephants: Novel Erasure Codes for Big Data
@article{Sathiamoorthy2013XORingEN, title={XORing Elephants: Novel Erasure Codes for Big Data}, author={Maheswaran Sathiamoorthy and Megasthenis Asteris and Dimitris Papailiopoulos and Alexandros G. Dimakis and Ramkumar Vadali and Scott Chen and Dhruba Borthakur}, journal={Proc. VLDB Endow.}, year={2013}, volume={6}, pages={325-336} }
Distributed storage systems for large clusters typically use replication to provide reliability. Recently, erasure codes have been used to reduce the large storage overhead of three-replicated systems. Reed-Solomon codes are the standard design choice and their high repair cost is often considered an unavoidable price to pay for high storage efficiency and high reliability.
This paper shows how to overcome this limitation. We present a novel family of erasure codes that are efficiently…
Figures and Tables from this paper
676 Citations
A Tale of Two Erasure Codes in HDFS
- Computer ScienceFAST
- 2015
HACFS is a new erasure-coded storage system that instead uses two different erasure codes and dynamically adapts to workload changes and uses a fast code to optimize for recovery performance and a compact code to reduce the storage overhead.
Opening the Chrysalis: On the Real Repair Performance of MSR Codes
- Computer ScienceFAST
- 2016
This work provides a performance analysis of Butterfly codes, systematic MSR codes with optimal repair I/O, and shows that with new distributed system features, and careful implementation, the theoretically expected repair performance of MSR code can be achieved.
expanCodes: Tailored LDPC Codes for Big Data Storage
- Computer Science2016 IEEE 14th Intl Conf on Dependable, Autonomic and Secure Computing, 14th Intl Conf on Pervasive Intelligence and Computing, 2nd Intl Conf on Big Data Intelligence and Computing and Cyber Science and Technology Congress(DASC/PiCom/DataCom/CyberSciTech)
- 2016
A novel method is proposed to construct a family of LDPC codes - expanCodes with expandable sizes - which allows the encoding and decoding complexity remain unchanged with the increase of the size of theLDPC codes and can be achieved without additional computation and repair traffic.
On the implementation of Zigzag codes for distributed storage system
- Computer Science2015 IEEE International Conference on Big Data (Big Data)
- 2015
This work first builds a general system on Hadoop to evaluate the encoding, decoding and repair performance of different codes, and then implements Zigzag codes, a MDS array code with optimal repair property, in the practical system.
Large LDPC Codes for Big Data Storage
- Computer Science
- 2015
This paper investigates in details about repair traffic to apply Low Density Parity Check (LDPC) codes with relatively large block sizes and shows that significant improvement in reliability can be achieved through using large LDPC codes without increasing the repair latency and network traffic especially for multiple erasures.
Hybrid-RC: Flexible Erasure Codes with Optimized Recovery Performance and Low Storage Overhead
- Computer Science2017 IEEE 36th Symposium on Reliable Distributed Systems (SRDS)
- 2017
This paper presents Hybrid Regenerating Codes (Hybrid-RC), a new set of erasure codes with optimized recovery performance and low storage overhead, and shows that Hybrid-RC reduces the reconstruction cost by up to 21% compared to the Local Reconstruction Codes with the same storage overhead.
Minimum storage BASIC codes: A system perspective
- Computer Science2013 IEEE International Conference on Big Data
- 2013
This paper integrates one construction of the minimum storage BASIC (MS-BASIC) codes into a Hadoop HDFS cluster testbed with up to 22 storage nodes and demonstrates that MS-BasIC codes conform to the theoretical findings and achieve recovery bandwidth saving compared to the conventional recovery approach based on RS codes.
Optimal locally repairable codes and connections to matroid theory
- Computer Science2013 IEEE International Symposium on Information Theory
- 2013
This work presents an explicit and simple to implement construction of optimal LRCs, for code parameters previously established by existence results, and derives a new result on the matroid represented by the code's generator matrix.
Have a Seat on the ErasureBench: Easy Evaluation of Erasure Coding Libraries for Distributed Storage Systems
- Computer Science2016 IEEE 35th Symposium on Reliable Distributed Systems Workshops (SRDSW)
- 2016
The experiments show that LRC and RS codes require the same repair throughput when used with small storage nodes, since cluster and network management traffic dominate at this regime, and the theoretical and practical tradeoffs between the storage overhead and repair bandwidth of RS and LRC codes.
Beehive: Erasure Codes for Fixing Multiple Failures in Distributed Storage Systems
- Computer ScienceIEEE Transactions on Parallel and Distributed Systems
- 2017
This paper proposes Beehive codes, designed for optimizing the volume of network transfers to fix the data on multiple failed storage servers, and implements both Beeh hive codes in C++ and evaluates their performance on Amazon EC2.
References
SHOWING 1-10 OF 43 REFERENCES
Rethinking erasure codes for cloud file systems: minimizing I/O for recovery and degraded reads
- Computer ScienceFAST
- 2012
An algorithm is presented that finds the optimal number of codeword symbols needed for recovery for any XOR-based erasure code and produces recovery schedules that use a minimum amount of data.
Simple regenerating codes: Network coding for cloud storage
- Computer Science2012 Proceedings IEEE INFOCOM
- 2012
This paper introduces the first family of distributed storage codes that have simple look-up repair and can achieve rates up to 2/3, and their constructions are very simple to implement and perform exact repair by simple XORing of packets.
Locally Repairable Codes
- Computer ScienceIEEE Transactions on Information Theory
- 2014
This paper explores the repair metric of locality, which corresponds to the number of disk accesses required during a single node repair, and shows the existence of optimal locally repairable codes (LRCs) that achieve this tradeoff.
Interference Alignment in Regenerating Codes for Distributed Storage: Necessity and Code Constructions
- Computer ScienceIEEE Transactions on Information Theory
- 2012
The constructions presented in this paper are the first explicit constructions of regenerating codes that achieve the cut-set bound, and Interference alignment is a theme that runs throughout the paper.
MDS array codes with optimal rebuilding
- Computer Science2011 IEEE International Symposium on Information Theory Proceedings
- 2011
A new family of r-erasure correcting MDS array codes is constructed that has optimal rebuilding ratio of 1 over r in the case of a single erasure.
Distributed Data Storage with Minimum Storage Regenerating Codes - Exact and Functional Repair are Asymptotically Equally Efficient
- Computer ScienceArXiv
- 2010
The main result is that, for any (n,k), and sufficiently large file sizes, there is no extra cost of exact regeneration over functional regeneration in terms of the repair bandwidth per bit of regenerated data.
Pyramid Codes: Flexible Schemes to Trade Space for Access Efficiency in Reliable Data Storage Systems
- Computer ScienceSixth IEEE International Symposium on Network Computing and Applications (NCA 2007)
- 2007
To flexibly explore the trade-offs between storage space and access efficiency in reliable data storage systems, we describe two classes of erasure resilient coding schemes: basic and generalized…
Self-repairing homomorphic codes for distributed storage systems
- Computer Science2011 Proceedings IEEE INFOCOM
- 2011
This work proposes as an alternative a new family of codes to improve the maintenance process, called self-repairing codes (SRC), with the following salient features: encoded fragments can be repaired directly from other subsets of encoded fragments by downloading less data than the size of the complete object, and allow reconstruction with lower latency by facilitating repairs in parallel.
A Survey on Network Codes for Distributed Storage
- Computer ScienceProceedings of the IEEE
- 2011
An overview of the research results on network coding techniques is provided, establishing that maintenance bandwidth can be reduced by orders of magnitude compared to standard erasure codes.