# Stochastic Gradient Coding for Straggler Mitigation in Distributed Learning

@article{Bitar2020StochasticGC,
title={Stochastic Gradient Coding for Straggler Mitigation in Distributed Learning},
author={Rawad Bitar and Mary Wootters and Salim el Rouayheb},
journal={IEEE Journal on Selected Areas in Information Theory},
year={2020},
volume={1},
pages={277-291}
}
• Published 14 May 2019
• Computer Science
• IEEE Journal on Selected Areas in Information Theory
We consider distributed gradient descent in the presence of stragglers. Recent work on <italic>gradient coding</italic> and <italic>approximate gradient coding</italic> have shown how to add redundancy in distributed gradient descent to guarantee convergence even if some workers are <italic>stragglers</italic>—that is, slow or non-responsive. In this work we propose an approximate gradient coding scheme called <italic>Stochastic Gradient Coding</italic> (SGC), which works when the stragglers…
37 Citations

## Figures and Tables from this paper

### Stochastic Gradient Coding for Flexible Straggler Mitigation in Distributed Learning

• Computer Science
2019 IEEE Information Theory Workshop (ITW)
• 2019
It is proved that the convergence rate of SGC mirrors that of batched Stochastic Gradient Descent for the $l_{2}$ loss function, and it is shown how the converge rate can improve with the redundancy.

• Computer Science
IEEE INFOCOM 2021 - IEEE Conference on Computer Communications
• 2021
A Live Gradient Compensation (LGC) strategy to incorporate the one-step delayed gradients from stragglers, aiming to accelerate learning process and utilize the straggler nodes simultaneously is developed, and the numerical results demonstrate the effectiveness of the proposed strategy.

• Computer Science
IEEE Journal on Selected Areas in Information Theory
• 2021
This paper characterize the optimum communication cost for heterogeneous distributed systems with \emph{arbitrary} data placement, and proposes an approximate gradient coding scheme for the cases when the repetition in data placement is smaller than what is needed to meet the restriction imposed on communication cost.

### Optimization-based Block Coordinate Gradient Coding for Mitigating Partial Stragglers in Distributed Learning

• Computer Science
ArXiv
• 2022
This paper designs a new gradient coding scheme for mitigating partial stragglers in distributed learning and considers a distributed system consisting of one master and N workers, characterized by a general partial straggler model and focuses on solving a general large-scale machine learning problem with L model parameters using gradient coding.

### Lightweight Projective Derivative Codes for Compressed Asynchronous Gradient Descent

• Computer Science
ICML
• 2022
A novel algorithm is proposed that encodes the partial derivatives themselves and furthermore optimizes the codes by performing lossy compression on the derivative codewords by maximizing the information contained in the codeword while minimizing the information between thecodewords.

### Optimization-based Block Coordinate Gradient Coding

• Computer Science
2021 IEEE Global Communications Conference (GLOBECOM)
• 2021
This paper considers a distributed computation system consisting of one master and N workers characterized by a general partial straggler model and focuses on solving a general large-scale machine learning problem with $L$ model parameters, obtaining an optimal solution using a stochastic projected subgradient method and proposing two low-complexity approximate solutions with closed-from expressions for the stochastically optimization problem.

### Approximate Gradient Coding With Optimal Decoding

• Computer Science
IEEE Journal on Selected Areas in Information Theory
• 2021
This work introduces novel approximate gradient codes based on expander graphs, in which each machine receives exactly two blocks of data points, and demonstrates empirically that these schemes achieve near-optimal error in the random setting and converge faster than algorithms which do not use the optimal decoding coefficients.

### Gradient Coding with Dynamic Clustering for Straggler-Tolerant Distributed Learning

• Computer Science
IEEE Transactions on Communications
• 2022
A novel paradigm of dynamic coded computation is introduced, which assigns redundant data to workers to acquire the ability to dynamically choose from among a set of possible codes depending on the past straggling behavior, called GC-DC, and regulates the number of stragglers in each cluster by dynamically forming the clusters at each iteration.

### Approximate Gradient Coding for Heterogeneous Nodes

• Computer Science
2021 IEEE Information Theory Workshop (ITW)
• 2021
This work introduces a heterogeneous straggler model where nodes are categorized into two classes, slow and active, and modify the existing gradient coding schemes with shuffling of the training data among workers to better utilize training data stored with slow nodes.

### LAGC: Lazily Aggregated Gradient Coding for Straggler-Tolerant and Communication-Efficient Distributed Learning

• Computer Science
IEEE Transactions on Neural Networks and Learning Systems
• 2021
A unified analysis of gradient coding, worker grouping, and adaptive worker selection techniques in terms of wall-clock time, communication, and computation complexity measures shows that G-LAG provides the best wall- clock time and communication performance while maintaining a low computational cost.

## References

SHOWING 1-10 OF 66 REFERENCES

### Distributed Stochastic Gradient Descent Using LDGM Codes

• Computer Science
2019 IEEE International Symposium on Information Theory (ISIT)
• 2019
In the proposed system, it may take longer time than existing GC methods to recover the gradient information completely, however, it enables the master node to obtain a high-quality unbiased estimator of the gradient at low computational cost and it leads to overall performance improvement.

### Speeding Up Distributed Machine Learning Using Codes

• Computer Science
IEEE Transactions on Information Theory
• 2018
This paper focuses on two of the most basic building blocks of distributed learning algorithms: matrix multiplication and data shuffling, and uses codes to reduce communication bottlenecks, exploiting the excess in storage.

• Computer Science
2019 IEEE International Symposium on Information Theory (ISIT)
• 2019
This work proposes a class of approximate gradient codes based on balanced incomplete block designs (BIBDs), and shows that the approximation error for these codes depends only on the number of stragglers, and thus, adversarial straggler selection has no advantage over random selection.

### Near-Optimal Straggler Mitigation for Distributed Gradient Methods

• Computer Science
2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)
• 2018
This work proves that the proposed Batched Coupon's Collector (BCC) scheme is robust to a near optimal number of random stragglers, and reduces the run-time by up to 85.4% over Amazon EC2 clusters when compared with other straggler mitigation strategies.

### Improving Distributed Gradient Descent Using Reed-Solomon Codes

• Computer Science
2018 IEEE International Symposium on Information Theory (ISIT)
• 2018
This work adopts the framework of Tandon et al. and presents a deterministic scheme that, for a prescribed per-machine computational effort, recovers the gradient from the least number of machines theoretically permissible, via an O(f^{2})\$ decoding algorithm.

### Gradient Coding From Cyclic MDS Codes and Expander Graphs

• Computer Science
IEEE Transactions on Information Theory
• 2020
This paper designs novel gradient codes using tools from classical coding theory, namely, cyclic MDS codes, which compare favorably with existing solutions, both in the applicable range of parameters and in the complexity of the involved algorithms.

### Straggler Mitigation in Distributed Optimization Through Data Encoding

• Computer Science
NIPS
• 2017
This paper proposes several encoding schemes, and demonstrates that popular batch algorithms, such as gradient descent and L-BFGS, applied in a coding-oblivious manner, deterministically achieve sample path linear convergence to an approximate solution of the original problem, using an arbitrarily varying subset of the nodes at each iteration.

### Fundamental Limits of Approximate Gradient Coding

• Computer Science
Proc. ACM Meas. Anal. Comput. Syst.
• 2019
Two approximate gradient coding schemes that exactly match such lower bounds based on random edge removal process are proposed, which provide order-wise improvement over the state of the art in terms of computation load, and are also optimal in both computation load and latency.

### Gradient Coding: Avoiding Stragglers in Distributed Learning

• Computer Science
ICML
• 2017
We propose a novel coding theoretic framework for mitigating stragglers in distributed learning. We show how carefully replicating data blocks and coding across gradients can provide tolerance to