# BEER: Fast O(1/T) Rate for Decentralized Nonconvex Optimization with Communication Compression

@article{Zhao2022BEERFO, title={BEER: Fast O(1/T) Rate for Decentralized Nonconvex Optimization with Communication Compression}, author={Haoyu Zhao and Boyue Li and Zhize Li and Peter Richt'arik and Yuejie Chi}, journal={ArXiv}, year={2022}, volume={abs/2201.13320} }

Communication eﬃciency has been widely recognized as the bottleneck for large-scale decentralized machine learning applications in multi-agent or federated environments. To tackle the communication bottleneck, there have been many eﬀorts to design communication-compressed algorithms for decentralized nonconvex optimization, where the clients are only allowed to communicate a small amount of quantized information (aka bits) with their neighbors over a predeﬁned graph topology. Despite signiﬁcant…

## 7 Citations

### CEDAS: A Compressed Decentralized Stochastic Gradient Method with Improved Convergence

- Computer ScienceArXiv
- 2023

,

### A Multi-Token Coordinate Descent Method for Vertical Federated Learning

- Computer Science
- 2022

This work formalizes the multi-token semi-decentralized scheme, which subsumes the client-server and decentralized setups, and design a feature-distributed learning algorithm for this setup, which can be seen as a parallel Markov chain (block) coordinate descent algorithm.

### Coresets for Vertical Federated Learning: Regularized Linear Regression and K-Means Clustering

- Computer ScienceArXiv
- 2022

This paper proposes a coreset framework by constructing coresets in a distributed fashion for communication-efﬁcient VFL, and theoretically shows that using coresets can drastically alleviate the communication complexity, while nearly maintain the solution quality.

### Simple and Optimal Stochastic Gradient Methods for Nonsmooth Nonconvex Optimization

- Computer ScienceArXiv
- 2022

This work proposes and analyzes several stochastic gradient algorithms for finding stationary points or local minimum in nonconvex, possibly with nonsmooth regularizer, finite-sum and online optimization problems, and proposes an optimal algorithm, called SSRGD, based on SARAH, which can find an -approximate (first-order) stationary point by simply adding some random perturbations.

### SoteriaFL: A Unified Framework for Private Federated Learning with Communication Compression

- Computer ScienceArXiv
- 2022

A framework called SoteriaFL is proposed, which accommodates a general family of local gradient estimators including popular stochastic variance-reduced gradient methods and the state-of-the-art shifted compression scheme, and is shown to achieve better communication complexity without sacriﬁcing privacy nor utility than other private federated learning algorithms without communication compression.

### Lower Bounds and Nearly Optimal Algorithms in Distributed Learning with Communication Compression

- Computer ScienceArXiv
- 2022

A convergence lower bound for algorithms whether using unbiased or contractive compressors in unidirection or bidirection is established and an algorithm is proposed, NEOLITHIC, which almost reaches the lower bound (up to logarithm factors) under mild conditions.

### DESTRESS: Computation-Optimal and Communication-Efficient Decentralized Nonconvex Finite-Sum Optimization

- Computer ScienceSIAM J. Math. Data Sci.
- 2022

A new algorithm, called DEcentralized STochastic REcurSive gradient methodS (DESTRESS) for nonconvex optimization, which matches the optimal incremental incremental oracle complexity of centralized algorithms for stationary points, while maintaining communication ef-ﬁciency.

## References

SHOWING 1-10 OF 64 REFERENCES

### SQuARM-SGD: Communication-Efficient Momentum SGD for Decentralized Optimization

- Computer ScienceIEEE Journal on Selected Areas in Information Theory
- 2021

Theoretical understanding is corroborated with experiments and the performance of the algorithm is compared with the state-of-the-art, showing that without sacrificing much on the accuracy, SQuARM-SGD converges at a similar rate while saving significantly in total communicated bits.

### Decentralized Deep Learning with Arbitrary Communication Compression

- Computer ScienceICLR
- 2020

The use of communication compression in the decentralized training context achieves linear speedup in the number of workers and supports higher compression than previous state-of-the art methods.

### DeepSqueeze : Decentralization Meets Error-Compensated Compression

- Computer Science
- 2019

This paper proposes an elegant algorithmic design to employ error-compensated stochastic gradient descent for the decentralized scenario, named DeepSqueeze, and is the first time to apply the error-Compensated compression to the decentralized learning.

### QSGD: Communication-Efficient SGD via Gradient Quantization and Encoding

- Computer ScienceNIPS
- 2017

Quantized SGD is proposed, a family of compression schemes for gradient updates which provides convergence guarantees and leads to significant reductions in end-to-end training time, and can be extended to stochastic variance-reduced techniques.

### Decentralized Stochastic Optimization and Gossip Algorithms with Compressed Communication

- Computer ScienceICML
- 2019

This work presents a novel gossip-based stochastic gradient descent algorithm, CHOCO-SGD, that converges at rate $\mathcal{O}\left(1/(nT) + 1/(T \delta^2 \omega)^2\right)$ for strongly convex objectives, where $T$ denotes the number of iterations and $\delta$ the eigengap of the connectivity matrix.

### Sparsified SGD with Memory

- Computer ScienceNeurIPS
- 2018

This work analyzes Stochastic Gradient Descent with k-sparsification or compression (for instance top-k or random-k) and shows that this scheme converges at the same rate as vanilla SGD when equipped with error compensation.

### Can Decentralized Algorithms Outperform Centralized Algorithms? A Case Study for Decentralized Parallel Stochastic Gradient Descent

- Computer ScienceNIPS
- 2017

This paper studies a D-PSGD algorithm and provides the first theoretical analysis that indicates a regime in which decentralized algorithms might outperform centralized algorithms for distributed stochastic gradient descent.

### EF21: A New, Simpler, Theoretically Better, and Practically Faster Error Feedback

- Computer ScienceNeurIPS
- 2021

It is proved that EF21 enjoys a fast O(1/T ) convergence rate for smooth nonconvex problems, beating the previous bound of O( 1/T ), which was shown under a strong bounded gradients assumption.

### A Compressed Gradient Tracking Method for Decentralized Optimization With Linear Convergence

- Computer ScienceIEEE Transactions on Automatic Control
- 2022

It is shown that C-GT inherits the advantages of gradient tracking-based algorithms and achieves linear convergence rate for strongly convex and smooth objective functions.

### Fast Decentralized Nonconvex Finite-Sum Optimization with Recursive Variance Reduction

- Computer Science, MathematicsSIAM J. Optim.
- 2022

Over infinite time horizon, it is established that all nodes in GT-SARAH asymptotically achieve consensus and converge to a first-order stationary point in the almost sure and mean-squared sense.