# An Empirical Study on Compressed Decentralized Stochastic Gradient Algorithms with Overparameterized Models

@article{Rao2021AnES, title={An Empirical Study on Compressed Decentralized Stochastic Gradient Algorithms with Overparameterized Models}, author={Arjun Ashok Rao and Hoi-To Wai}, journal={ArXiv}, year={2021}, volume={abs/2110.04523} }

This paper considers decentralized optimization with application to machine learning on graphs. The growing size of neural network (NN) models has motivated prior works on decentralized stochastic gradient algorithms to incorporate communication compression. On the other hand, recent works have demonstrated the favorable convergence and generalization properties of overparameterized NNs. In this work, we present an empirical analysis on the performance of compressed decentralized stochastic… Expand

#### References

SHOWING 1-10 OF 32 REFERENCES

Can Decentralized Algorithms Outperform Centralized Algorithms? A Case Study for Decentralized Parallel Stochastic Gradient Descent

- Computer Science, Mathematics
- NIPS
- 2017

This paper studies a D-PSGD algorithm and provides the first theoretical analysis that indicates a regime in which decentralized algorithms might outperform centralized algorithms for distributed stochastic gradient descent. Expand

Decentralized Stochastic Optimization and Gossip Algorithms with Compressed Communication

- Computer Science, Mathematics
- ICML
- 2019

This work presents a novel gossip-based stochastic gradient descent algorithm, CHOCO-SGD, that converges at rate $\mathcal{O}\left(1/(nT) + 1/(T \delta^2 \omega)^2\right)$ for strongly convex objectives, where $T$ denotes the number of iterations and $\delta$ the eigengap of the connectivity matrix. Expand

A Linearly Convergent Algorithm for Decentralized Optimization: Sending Less Bits for Free!

- Computer Science, Mathematics
- AISTATS
- 2021

This work proposes a new randomized first-order method which tackles the communication bottleneck by applying randomized compression operators to the communicated messages and obtains the first scheme that converges linearly on strongly convex decentralized problems while using compressed communication only. Expand

Improving the Sample and Communication Complexity for Decentralized Non-Convex Optimization: Joint Gradient Estimation and Tracking

- Computer Science
- ICML
- 2020

This work proposes an algorithm named D-GET (decentralized gradient estimation and tracking), which jointly performs decentralized gradient estimation (which estimates the local gradient using a subset of local samples) and gradient tracking (which tracks the global full gradient using local estimates). Expand

Communication Compression for Decentralized Training

- Computer Science, Mathematics
- NeurIPS
- 2018

This paper develops a framework of quantized, decentralized training and proposes two different strategies, which are called extrapolation compression and difference compression, which outperforms the best of merely decentralized and merely quantized algorithm significantly for networks with high latency and low bandwidth. Expand

Decentralized Deep Learning with Arbitrary Communication Compression

- Computer Science, Mathematics
- ICLR
- 2020

The use of communication compression in the decentralized training context achieves linear speedup in the number of workers and supports higher compression than previous state-of-the art methods. Expand

The Convergence of Sparsified Gradient Methods

- Computer Science, Mathematics
- NeurIPS
- 2018

It is proved that, under analytic assumptions, sparsifying gradients by magnitude with local error correction provides convergence guarantees, for both convex and non-convex smooth objectives, for data-parallel SGD. Expand

Understanding Top-k Sparsification in Distributed Deep Learning

- Computer Science, Mathematics
- ArXiv
- 2019

The property of gradient distribution is exploited to propose an approximate top-$k$ selection algorithm, which is computing-efficient for GPUs, to improve the scaling efficiency of TopK-SGD by significantly reducing the computing overhead. Expand

signSGD: compressed optimisation for non-convex problems

- Computer Science, Mathematics
- ICML
- 2018

SignSGD can get the best of both worlds: compressed gradients and SGD-level convergence rate, and the momentum counterpart of signSGD is able to match the accuracy and convergence speed of Adam on deep Imagenet models. Expand

NEXT: In-Network Nonconvex Optimization

- Computer Science, Mathematics
- IEEE Transactions on Signal and Information Processing over Networks
- 2016

This work introduces the first algorithmic framework for the distributed minimization of the sum of a smooth function-the agents' sum-utility-plus a convex (possibly nonsmooth and nonseparable) regularizer, and shows that the new method compares favorably to existing distributed algorithms on both convex and nonconvex problems. Expand