# Rateless Codes for Near-Perfect Load Balancing in Distributed Matrix-Vector Multiplication

@article{Mallick2019RatelessCF, title={Rateless Codes for Near-Perfect Load Balancing in Distributed Matrix-Vector Multiplication}, author={Ankur Mallick and Malhar Chaudhari and Gauri Joshi}, journal={Proceedings of the ACM on Measurement and Analysis of Computing Systems}, year={2019}, volume={3}, pages={1 - 40} }

Large-scale machine learning and data mining applications require computer systems to perform massive matrix-vector and matrix-matrix multiplication operations that need to be parallelized across multiple nodes. The presence of straggling nodes -- computing nodes that unpredictably slowdown or fail -- is a major bottleneck in such distributed computations. Ideal load balancing strategies that dynamically allocate more tasks to faster nodes require knowledge or monitoring of node speeds as well…

## Figures and Tables from this paper

## 90 Citations

Rateless Codes for Distributed Computations with Sparse Compressed Matrices

- Computer Science2019 IEEE International Symposium on Information Theory (ISIT)
- 2019

This work proposes a balanced row-allocation strategy for allocating rows of a sparse matrix to workers that ensures that equal amount of non-zero matrix entries are assigned to each worker and achieves significantly lower overall latency than conventional sparse matrix-vector multiplication strategies.

Rateless Codes for Distributed Non-linear Computations

- Computer Science2021 11th International Symposium on Topics in Coding (ISTC
- 2021

This work proposes a coded computing strategy for mitigating the effect of stragglers on non-linear distributed computations and shows that erasure codes can be used to generate and compute random linear combinations of functions at the nodes such that the original function can be computed as long as a subset of nodes return their computations.

An Application of Storage-Optimal MatDot Codes for Coded Matrix Multiplication: Fast k-Nearest Neighbors Estimation

- Computer Science2018 IEEE International Conference on Big Data (Big Data)
- 2018

Two techniques to parallelize MRPT that exploit data and model parallelism respectively by dividing both the data storage and the computation efforts among different nodes in a distributed computing cluster are proposed.

Erasure Coding for Distributed Matrix Multiplication for Matrices With Bounded Entries

- Computer ScienceIEEE Communications Letters
- 2019

This work presents a novel coding strategy for distributed matrix multiplication when the absolute values of the matrix entries are sufficiently small, and demonstrates a tradeoff between the assumed absolute value bounds on the Matrix entries and the recovery threshold.

Coded computation over heterogeneous clusters

- Computer Science2017 IEEE International Symposium on Information Theory (ISIT)
- 2017

This paper proposes Heterogeneous Coded Matrix Multiplication (HCMM) algorithm for performing distributed matrix multiplication over heterogeneous clusters that is provably asymptotically optimal and provides numerical results demonstrating significant speedups of up to 49% and 34% for HCMM in comparison to the “uncoded” and “homogeneous coded” schemes.

Fast and Efficient Distributed Matrix-vector Multiplication Using Rateless Fountain Codes

- Computer ScienceICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2019

Compared to recently proposed fixed-rate erasure coding strategies which ignore partial work done by straggling nodes, rateless codes have a significantly lower overall delay, and a smaller computational overhead.

Price of Precision in Coded Distributed Matrix Multiplication: A Dimensional Analysis

- Computer Science2021 IEEE Information Theory Workshop (ITW)
- 2021

A rudimentary asymptotic dimensional analysis of AMD codes inspired by the Generalized Degrees of Freedom (GDoF) framework previously developed for wireless networks indicates that for the same upload/storage, once the precision levels of the task assignments are accounted for, AMD codes are not better than a replication scheme which assigns the full computation task to every server.

Coded Computation Over Heterogeneous Clusters

- Computer ScienceIEEE Transactions on Information Theory
- 2019

This paper proposes heterogeneous coded matrix multiplication (HCMM) algorithm for performing distributed matrix multiplication over heterogeneous clusters that are provably asymptotically optimal for a broad class of processing time distributions and develops a heuristic algorithm for HCMM load allocation for the distributed implementation of budget-limited computation tasks.

C3LES: Codes for Coded Computation that Leverage Stragglers

- Computer Science2018 IEEE Information Theory Workshop (ITW)
- 2018

A fine-grained model is proposed that quantifies the level of non-trivial coding needed to obtain the benefits of coding in matrix-vector computation and allows us to leverage partial computations performed by the straggler nodes.

Universally Decodable Matrices for Distributed Matrix-Vector Multiplication

- Computer Science2019 IEEE International Symposium on Information Theory (ISIT)
- 2019

A class of distributed matrix-vector multiplication schemes that are based on codes in the Rosenbloom-Tsfasman metric and universally decodable matrices are presented that allow us to effectively leverage partial computations performed by stragglers.

## References

SHOWING 1-10 OF 70 REFERENCES

Coded Distributed Computing for Inverse Problems

- Computer ScienceNIPS
- 2017

This paper designs a novel error-correcting-code inspired technique for solving linear inverse problems under specific iterative methods in a parallelized implementation affected by stragglers, and provably shows that this coded-computation technique can reduce the mean-squared error under a computational deadline constraint.

Polynomial Codes: an Optimal Design for High-Dimensional Coded Matrix Multiplication

- Computer ScienceNIPS
- 2017

We consider a large-scale matrix multiplication problem where the computation is carried out using a distributed system with a master node and multiple worker nodes, where each worker can store parts…

Coded convolution for parallel and distributed computing within a deadline

- Computer Science2017 IEEE International Symposium on Information Theory (ISIT)
- 2017

The utility of a novel asymptotic failure exponent analysis for distributed systems is established through the problem of coded convolution of two long vectors using parallel processors in the presence of "stragglers".

Numpywren: Serverless Linear Algebra

- Computer ScienceArXiv
- 2018

Numpywren is presented, a system for linear algebra built on a serverless architecture, and LAmbdaPACK, a domain-specific language designed to implement highly parallel linear algebra algorithms in aserverless setting, which highlights how cloud providers could better support these types of computations through small changes in their infrastructure.

Improving Distributed Gradient Descent Using Reed-Solomon Codes

- Computer Science2018 IEEE International Symposium on Information Theory (ISIT)
- 2018

This work adopts the framework of Tandon et al. and presents a deterministic scheme that, for a prescribed per-machine computational effort, recovers the gradient from the least number of machines theoretically permissible, via an O(f^{2})$ decoding algorithm.

Matrix sparsification for coded matrix multiplication

- Computer Science2017 55th Annual Allerton Conference on Communication, Control, and Computing (Allerton)
- 2017

This work shows that the Short-Dot scheme is optimal if an Maximum Distance Separable (MDS) matrix is fixed, and proposes a new encoding scheme that can achieve a strictly larger sparsity than the existing schemes.

Encoded distributed optimization

- Computer Science2017 IEEE International Symposium on Information Theory (ISIT)
- 2017

It is shown that under moderate amounts of redundancy, it is possible to recover a close approximation to the solution under node failures and obtain an explicit error bound for a specific construction that uses Paley graphs.

Efficient Straggler Replication in Large-Scale Parallel Computing

- Computer ScienceACM Trans. Model. Perform. Evaluation Comput. Syst.
- 2019

This article provides a framework to analyze this latency-cost tradeoff and find the best replication strategy by answering design questions, such as when to replicate straggling tasks, how many replicas to launch, and whether to kill the original copy or not.

Speeding Up Distributed Machine Learning Using Codes

- Computer ScienceIEEE Transactions on Information Theory
- 2018

This paper focuses on two of the most basic building blocks of distributed learning algorithms: matrix multiplication and data shuffling, and uses codes to reduce communication bottlenecks, exploiting the excess in storage.

Learning a Code: Machine Learning for Approximate Non-Linear Coded Computation

- Computer ScienceArXiv
- 2018

This work proposes the first learning-based approach for designing codes, and presents the first coding-theoretic solution that can provide resilience for any non-linear (differentiable) computation.