Corpus ID: 207757266

SPARQ-SGD: Event-Triggered and Compressed Communication in Decentralized Stochastic Optimization

@article{Singh2019SPARQSGDEA,
  title={SPARQ-SGD: Event-Triggered and Compressed Communication in Decentralized Stochastic Optimization},
  author={Navjot Singh and Deepesh Data and Jemin George and S. Diggavi},
  journal={ArXiv},
  year={2019},
  volume={abs/1910.14280}
}
In this paper, we propose and analyze SPARQ-SGD, which is an event-triggered and compressed algorithm for decentralized training of large-scale machine learning models. Each node can locally compute a condition (event) which triggers a communication where quantized and sparsified local model parameters are sent. In SPARQ-SGD each node takes at least a fixed number ($H$) of local gradient steps and then checks if the model parameters have significantly changed compared to its last update; it… Expand
SPARQ-SGD: Event-Triggered and Compressed Communication in Decentralized Optimization
TLDR
It is demonstrated that aggressive compression, including event-triggered communication, model sparsification and quantization does not affect the overall convergence rate compared to uncompressed decentralized training; thereby theoretically yielding communication efficiency for ‘free’. Expand
Q-GADMM: Quantized Group ADMM for Communication Efficient Decentralized Machine Learning
TLDR
A novel stochastic quantization method is developed to adaptively adjust model quantization levels and their probabilities, while proving the convergence of Q-GADMM for convex objective functions. Expand
Decentralized Federated Learning via SGD over Wireless D2D Networks
  • Hong Xing, O. Simeone, S. Bi
  • Computer Science, Engineering
  • 2020 IEEE 21st International Workshop on Signal Processing Advances in Wireless Communications (SPAWC)
  • 2020
TLDR
Wireless protocols are proposed that implement Decentralized Stochastic Gradient Descent by accounting for the presence of path loss, fading, blockages, and mutual interference in the deployment of Federated Learning. Expand
Shuffled Model of Differential Privacy in Federated Learning
TLDR
For convex loss functions, it is proved that the proposed CLDP-SGD algorithm matches the known lower bounds on the centralized private ERM while using a finite number of bits per iteration for each client, i.e., effectively getting communication efficiency for “free”. Expand
Shuffled Model of Federated Learning: Privacy, Accuracy and Communication Trade-Offs
TLDR
This work demonstrates that one can get the same privacy, optimization-performance operating point developed in recent methods that use full-precision communication, but at a much lower communication cost, i.e., effectively getting communication efficiency for “free”. Expand
Federated Learning over Wireless Device-to-Device Networks: Algorithms and Convergence Analysis
TLDR
This paper introduces generic digital and analog wireless implementations of communication-efficient DSGD algorithms, leveraging random linear coding (RLC) for compression and over-the-air computation (AirComp) for simultaneous analog transmissions and provides convergence bounds for both implementations. Expand
Randomized Reactive Redundancy for Byzantine Fault-Tolerance in Parallelized Learning
TLDR
This report considers the problem of Byzantine fault-tolerance in synchronous parallelized learning that is founded on the parallelized stochastic gradient descent (parallelized-SGD) algorithm and proposes two coding schemes, a deterministic scheme and a randomized scheme, for guaranteeing exact fault-Tolerance if $2f < n$. Expand
Decentralized Langevin Dynamics for Bayesian Learning
TLDR
This work proposes a collaborative Bayesian learning algorithm taking the form of decentralized Langevin dynamics in a non-convex setting and shows that the initial KL-divergence between the Markov Chain and the target posterior distribution is exponentially decreasing while the error contributions from the additive noise is decreasing in polynomial time. Expand
A Decentralized Approach to Bayesian Learning
TLDR
This work proposes a collaborative Bayesian learning algorithm taking the form of decentralized Langevin dynamics in a non-convex setting and shows that the initial KL-divergence between the Markov Chain and the target posterior distribution is exponentially decreasing while the error contributions from the additive noise is decreasing in polynomial time. Expand
Communication Efficient Distributed Learning with Censored, Quantized, and Generalized Group ADMM
TLDR
Numerical simulations corroborate that CQ-GGADMM exhibits higher communication efficiency in terms of the number of communication rounds and transmit energy consumption without compromising the accuracy and convergence speed, compared to the benchmark schemes based on decentralized ADMM without censoring, quantization, and/or the worker grouping method of GADMM. Expand
...
1
2
...

References

SHOWING 1-10 OF 34 REFERENCES
Decentralized Deep Learning with Arbitrary Communication Compression
TLDR
The use of communication compression in the decentralized training context achieves linear speedup in the number of workers and supports higher compression than previous state-of-the art methods. Expand
Qsparse-Local-SGD: Distributed SGD With Quantization, Sparsification, and Local Computations
TLDR
This paper proposes Qsparse-local-SGD algorithm, which combines aggressive sparsification with quantization and local computation along with error compensation, by keeping track of the difference between the true and compressed gradients, and demonstrates that it converges at the same rate as vanilla distributed SGD for many important classes of sparsifiers and quantizers. Expand
Decentralized Stochastic Optimization and Gossip Algorithms with Compressed Communication
TLDR
This work presents a novel gossip-based stochastic gradient descent algorithm, CHOCO-SGD, that converges at rate $\mathcal{O}\left(1/(nT) + 1/(T \delta^2 \omega)^2\right)$ for strongly convex objectives, where $T$ denotes the number of iterations and $\delta$ the eigengap of the connectivity matrix. Expand
QSGD: Communication-Efficient SGD via Gradient Quantization and Encoding
TLDR
Quantized SGD is proposed, a family of compression schemes for gradient updates which provides convergence guarantees and leads to significant reductions in end-to-end training time, and can be extended to stochastic variance-reduced techniques. Expand
Communication Compression for Decentralized Training
TLDR
This paper develops a framework of quantized, decentralized training and proposes two different strategies, which are called extrapolation compression and difference compression, which outperforms the best of merely decentralized and merely quantized algorithm significantly for networks with high latency and low bandwidth. Expand
Local SGD Converges Fast and Communicates Little
TLDR
It is proved concise convergence rates for local SGD on convex problems and show that it converges at the same rate as mini-batch SGD in terms of number of evaluated gradients, that is, the scheme achieves linear speedup in the number of workers andmini-batch size. Expand
Sparsified SGD with Memory
TLDR
This work analyzes Stochastic Gradient Descent with k-sparsification or compression (for instance top-k or random-k) and shows that this scheme converges at the same rate as vanilla SGD when equipped with error compensation. Expand
Stochastic Gradient Push for Distributed Deep Learning
TLDR
Stochastic Gradient Push is studied, it is proved that SGP converges to a stationary point of smooth, non-convex objectives at the same sub-linear rate as SGD, and that all nodes achieve consensus. Expand
On the Linear Speedup Analysis of Communication Efficient Momentum SGD for Distributed Non-Convex Optimization
TLDR
This paper considers a distributed communication efficient momentum SGD method and proves its linear speedup property, filling the gap in the study of distributed SGD variants with reduced communication. Expand
Can Decentralized Algorithms Outperform Centralized Algorithms? A Case Study for Decentralized Parallel Stochastic Gradient Descent
TLDR
This paper studies a D-PSGD algorithm and provides the first theoretical analysis that indicates a regime in which decentralized algorithms might outperform centralized algorithms for distributed stochastic gradient descent. Expand
...
1
2
3
4
...