LAGC: Lazily Aggregated Gradient Coding for Straggler-Tolerant and Communication-Efficient Distributed Learning

@article{Zhang2021LAGCLA,
  title={LAGC: Lazily Aggregated Gradient Coding for Straggler-Tolerant and Communication-Efficient Distributed Learning},
  author={Jingjing Zhang and Osvaldo Simeone},
  journal={IEEE Transactions on Neural Networks and Learning Systems},
  year={2021},
  volume={32},
  pages={962-974}
}
  • Jingjing Zhang, O. Simeone
  • Published 22 May 2019
  • Computer Science
  • IEEE Transactions on Neural Networks and Learning Systems
Gradient-based distributed learning in parameter server (PS) computing architectures is subject to random delays due to straggling worker nodes and to possible communication bottlenecks between PS and workers. Solutions have been recently proposed to separately address these impairments based on the ideas of gradient coding (GC), worker grouping, and adaptive worker selection. This article provides a unified analysis of these techniques in terms of wall-clock time, communication, and… 

Figures and Tables from this paper

Adaptive Worker Grouping for Communication-Efficient and Straggler-Tolerant Distributed SGD
TLDR
A novel scheme named grouping-based CADA (G-CADA) is proposed that retains the advantages of CADA in reducing the communication load, while increasing the robustness to stragglers at the cost of additional storage at the workers.
Straggler-Aware Distributed Learning: Communication–Computation Latency Trade-Off
TLDR
This work considers multi-message communication (MMC) by allowing multiple computations to be conveyed from each worker per iteration, and proposes novel straggler avoidance techniques for both coded computation and coded communication with MMC.
Gradient Coding with Dynamic Clustering for Straggler-Tolerant Distributed Learning
TLDR
A novel paradigm of dynamic coded computation is introduced, which assigns redundant data to workers to acquire the ability to dynamically choose from among a set of possible codes depending on the past straggling behavior, called GC-DC, and regulates the number of stragglers in each cluster by dynamically forming the clusters at each iteration.
Joint Dynamic Grouping and Gradient Coding for Time-critical Distributed Machine Learning in Heterogeneous Edge Networks
TLDR
This work proposes a novel scheme named Dynamic Grouping and Heterogeneity-aware Gradient Coding (DGH-GC) to tolerate stragglers by employing dynamic grouping and gradient coding and proposes an algorithm called DGH-(GC)² to compress transferred gradients in both upstream communication and downstream communication.
Coded Consensus Monte Carlo: Robust One-Shot Distributed Bayesian Learning with Stragglers
TLDR
This letter studies distributed Bayesian learning in a setting encompassing a central server and multiple workers by focusing on the problem of mitigating the impact of stragglers by proposing two straggler-resilient solutions based on grouping and coding.
Lightweight Projective Derivative Codes for Compressed Asynchronous Gradient Descent
TLDR
This paper proposes a novel algorithm that encodes the partial derivatives themselves and furthermore optimizes the codes by performing lossy compression on the derivative codewords by maximizing the information contained in the codeword while minimizing the information between thecodewords.
Leveraging Spatial and Temporal Correlations in Sparsified Mean Estimation
TLDR
This work studies the problem of estimating at a central server the mean of a set of vectors distributed across several nodes (one vector per node) and provides an analysis of the resulting estimation error as well as experiments, which show that the estimators consistently outperform more sophisticated and expensive sparsification methods.
Distributed Machine Learning for Wireless Communication Networks: Techniques, Architectures, and Applications
TLDR
The latest applications of DML in power control, spectrum management, user association, and edge cloud computing, and the potential adversarial attacks faced by DML applications are reviewed, and state-of-the-art countermeasures to preserve privacy and security are described.
Ternary Compression for Communication-Efficient Federated Learning
TLDR
The proposed ternary federated averaging protocol (T-FedAvg) is effective in reducing communication costs and can even achieve slightly better performance on non-IID data in contrast to the canonical federated learning algorithms.
Fusion of Federated Learning and Industrial Internet of Things: A Survey
...
...

References

SHOWING 1-10 OF 53 REFERENCES
Gradient Coding with Clustering and Multi-Message Communication
TLDR
It is numerically show that the proposed GC with multi-message communication (MMC) together with clustering provides significant improvements in the average completion time (of each iteration), with minimal or no increase in the communication load.
LAG: Lazily Aggregated Gradient for Communication-Efficient Distributed Learning
TLDR
A new class of gradient methods for distributed machine learning that adaptively skip the gradient calculations to learn with reduced communication and computation is presented, justifying the acronym LAG used henceforth.
Improving Distributed Gradient Descent Using Reed-Solomon Codes
TLDR
This work adopts the framework of Tandon et al. and presents a deterministic scheme that, for a prescribed per-machine computational effort, recovers the gradient from the least number of machines theoretically permissible, via an O(f^{2})$ decoding algorithm.
Stochastic Gradient Coding for Straggler Mitigation in Distributed Learning
TLDR
This work proposes an approximate gradient coding scheme called SGC, which works when the stragglers are random, and proves that the convergence rate of SGC mirrors that of batched Stochastic Gradient Descent (SGD) for the <inline-formula> <tex-math notation="LaTeX">$\ell _{2}$ </tex- Math> loss function, and shows how the convergence rates can improve with the redundancy.
Slow and Stale Gradients Can Win the Race
TLDR
This work presents a novel theoretical characterization of the speed-up offered by asynchronous SGD methods by analyzing the trade-off between the error in the trained model and the actual training runtime (wallclock time).
Rateless Codes for Near-Perfect Load Balancing in Distributed Matrix-Vector Multiplication
TLDR
This paper proposes a rateless fountain coding strategy that achieves the best of both worlds -- it is proved that its latency is asymptotically equal to ideal load balancing, and it performs asymPTotically zero redundant computations.
Rateless Codes for Near-Perfect Load Balancing in Distributed Matrix-Vector Multiplication
TLDR
This work proposes a rateless fountain coding strategy to create linear combinations of the m rows of the matrix and assign these encoded rows to different worker nodes, which achieves optimal latency and performs zero redundant computations asymptotically.
Speeding Up Distributed Machine Learning Using Codes
TLDR
This paper focuses on two of the most basic building blocks of distributed learning algorithms: matrix multiplication and data shuffling, and uses codes to reduce communication bottlenecks, exploiting the excess in storage.
Communication-Efficient Distributed Dual Coordinate Ascent
TLDR
A communication-efficient framework that uses local computation in a primal-dual setting to dramatically reduce the amount of necessary communication is proposed, and a strong convergence rate analysis is provided for this class of algorithms.
QSGD: Communication-Efficient SGD via Gradient Quantization and Encoding
TLDR
Quantized SGD is proposed, a family of compression schemes for gradient updates which provides convergence guarantees and leads to significant reductions in end-to-end training time, and can be extended to stochastic variance-reduced techniques.
...
...