Rateless Codes for Near-Perfect Load Balancing in Distributed Matrix-Vector Multiplication

@article{Mallick2019RatelessCF,
  title={Rateless Codes for Near-Perfect Load Balancing in Distributed Matrix-Vector Multiplication},
  author={Ankur Mallick and Malhar Chaudhari and Gauri Joshi},
  journal={Proceedings of the ACM on Measurement and Analysis of Computing Systems},
  year={2019},
  volume={3},
  pages={1 - 40}
}
Large-scale machine learning and data mining applications require computer systems to perform massive matrix-vector and matrix-matrix multiplication operations that need to be parallelized across multiple nodes. The presence of straggling nodes -- computing nodes that unpredictably slowdown or fail -- is a major bottleneck in such distributed computations. Ideal load balancing strategies that dynamically allocate more tasks to faster nodes require knowledge or monitoring of node speeds as well… 
Rateless Codes for Distributed Computations with Sparse Compressed Matrices
TLDR
This work proposes a balanced row-allocation strategy for allocating rows of a sparse matrix to workers that ensures that equal amount of non-zero matrix entries are assigned to each worker and achieves significantly lower overall latency than conventional sparse matrix-vector multiplication strategies.
Rateless Codes for Distributed Non-linear Computations
TLDR
This work proposes a coded computing strategy for mitigating the effect of stragglers on non-linear distributed computations and shows that erasure codes can be used to generate and compute random linear combinations of functions at the nodes such that the original function can be computed as long as a subset of nodes return their computations.
An Application of Storage-Optimal MatDot Codes for Coded Matrix Multiplication: Fast k-Nearest Neighbors Estimation
TLDR
Two techniques to parallelize MRPT that exploit data and model parallelism respectively by dividing both the data storage and the computation efforts among different nodes in a distributed computing cluster are proposed.
Erasure Coding for Distributed Matrix Multiplication for Matrices With Bounded Entries
TLDR
This work presents a novel coding strategy for distributed matrix multiplication when the absolute values of the matrix entries are sufficiently small, and demonstrates a tradeoff between the assumed absolute value bounds on the Matrix entries and the recovery threshold.
Coded computation over heterogeneous clusters
TLDR
This paper proposes Heterogeneous Coded Matrix Multiplication (HCMM) algorithm for performing distributed matrix multiplication over heterogeneous clusters that is provably asymptotically optimal and provides numerical results demonstrating significant speedups of up to 49% and 34% for HCMM in comparison to the “uncoded” and “homogeneous coded” schemes.
Fast and Efficient Distributed Matrix-vector Multiplication Using Rateless Fountain Codes
TLDR
Compared to recently proposed fixed-rate erasure coding strategies which ignore partial work done by straggling nodes, rateless codes have a significantly lower overall delay, and a smaller computational overhead.
Price of Precision in Coded Distributed Matrix Multiplication: A Dimensional Analysis
TLDR
A rudimentary asymptotic dimensional analysis of AMD codes inspired by the Generalized Degrees of Freedom (GDoF) framework previously developed for wireless networks indicates that for the same upload/storage, once the precision levels of the task assignments are accounted for, AMD codes are not better than a replication scheme which assigns the full computation task to every server.
Coded Computation Over Heterogeneous Clusters
TLDR
This paper proposes heterogeneous coded matrix multiplication (HCMM) algorithm for performing distributed matrix multiplication over heterogeneous clusters that are provably asymptotically optimal for a broad class of processing time distributions and develops a heuristic algorithm for HCMM load allocation for the distributed implementation of budget-limited computation tasks.
C3LES: Codes for Coded Computation that Leverage Stragglers
TLDR
A fine-grained model is proposed that quantifies the level of non-trivial coding needed to obtain the benefits of coding in matrix-vector computation and allows us to leverage partial computations performed by the straggler nodes.
Universally Decodable Matrices for Distributed Matrix-Vector Multiplication
TLDR
A class of distributed matrix-vector multiplication schemes that are based on codes in the Rosenbloom-Tsfasman metric and universally decodable matrices are presented that allow us to effectively leverage partial computations performed by stragglers.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 70 REFERENCES
Coded Distributed Computing for Inverse Problems
TLDR
This paper designs a novel error-correcting-code inspired technique for solving linear inverse problems under specific iterative methods in a parallelized implementation affected by stragglers, and provably shows that this coded-computation technique can reduce the mean-squared error under a computational deadline constraint.
Polynomial Codes: an Optimal Design for High-Dimensional Coded Matrix Multiplication
We consider a large-scale matrix multiplication problem where the computation is carried out using a distributed system with a master node and multiple worker nodes, where each worker can store parts
Coded convolution for parallel and distributed computing within a deadline
TLDR
The utility of a novel asymptotic failure exponent analysis for distributed systems is established through the problem of coded convolution of two long vectors using parallel processors in the presence of "stragglers".
Numpywren: Serverless Linear Algebra
TLDR
Numpywren is presented, a system for linear algebra built on a serverless architecture, and LAmbdaPACK, a domain-specific language designed to implement highly parallel linear algebra algorithms in aserverless setting, which highlights how cloud providers could better support these types of computations through small changes in their infrastructure.
Improving Distributed Gradient Descent Using Reed-Solomon Codes
TLDR
This work adopts the framework of Tandon et al. and presents a deterministic scheme that, for a prescribed per-machine computational effort, recovers the gradient from the least number of machines theoretically permissible, via an O(f^{2})$ decoding algorithm.
Matrix sparsification for coded matrix multiplication
TLDR
This work shows that the Short-Dot scheme is optimal if an Maximum Distance Separable (MDS) matrix is fixed, and proposes a new encoding scheme that can achieve a strictly larger sparsity than the existing schemes.
Encoded distributed optimization
TLDR
It is shown that under moderate amounts of redundancy, it is possible to recover a close approximation to the solution under node failures and obtain an explicit error bound for a specific construction that uses Paley graphs.
Efficient Straggler Replication in Large-Scale Parallel Computing
TLDR
This article provides a framework to analyze this latency-cost tradeoff and find the best replication strategy by answering design questions, such as when to replicate straggling tasks, how many replicas to launch, and whether to kill the original copy or not.
Speeding Up Distributed Machine Learning Using Codes
TLDR
This paper focuses on two of the most basic building blocks of distributed learning algorithms: matrix multiplication and data shuffling, and uses codes to reduce communication bottlenecks, exploiting the excess in storage.
Learning a Code: Machine Learning for Approximate Non-Linear Coded Computation
TLDR
This work proposes the first learning-based approach for designing codes, and presents the first coding-theoretic solution that can provide resilience for any non-linear (differentiable) computation.
...
1
2
3
4
5
...