• Corpus ID: 246430634

# BEER: Fast O(1/T) Rate for Decentralized Nonconvex Optimization with Communication Compression

@article{Zhao2022BEERFO,
title={BEER: Fast O(1/T) Rate for Decentralized Nonconvex Optimization with Communication Compression},
author={Haoyu Zhao and Boyue Li and Zhize Li and Peter Richt'arik and Yuejie Chi},
journal={ArXiv},
year={2022},
volume={abs/2201.13320}
}
• Published 31 January 2022
• Computer Science
• ArXiv
Communication eﬃciency has been widely recognized as the bottleneck for large-scale decentralized machine learning applications in multi-agent or federated environments. To tackle the communication bottleneck, there have been many eﬀorts to design communication-compressed algorithms for decentralized nonconvex optimization, where the clients are only allowed to communicate a small amount of quantized information (aka bits) with their neighbors over a predeﬁned graph topology. Despite signiﬁcant…
7 Citations

## Figures and Tables from this paper

• Computer Science
• 2022
This work formalizes the multi-token semi-decentralized scheme, which subsumes the client-server and decentralized setups, and design a feature-distributed learning algorithm for this setup, which can be seen as a parallel Markov chain (block) coordinate descent algorithm.
• Computer Science
ArXiv
• 2022
This paper proposes a coreset framework by constructing coresets in a distributed fashion for communication-efﬁcient VFL, and theoretically shows that using coresets can drastically alleviate the communication complexity, while nearly maintain the solution quality.
• Computer Science
ArXiv
• 2022
This work proposes and analyzes several stochastic gradient algorithms for finding stationary points or local minimum in nonconvex, possibly with nonsmooth regularizer, finite-sum and online optimization problems, and proposes an optimal algorithm, called SSRGD, based on SARAH, which can find an -approximate (first-order) stationary point by simply adding some random perturbations.
• Computer Science
ArXiv
• 2022
A framework called SoteriaFL is proposed, which accommodates a general family of local gradient estimators including popular stochastic variance-reduced gradient methods and the state-of-the-art shifted compression scheme, and is shown to achieve better communication complexity without sacriﬁcing privacy nor utility than other private federated learning algorithms without communication compression.
• Computer Science
ArXiv
• 2022
A convergence lower bound for algorithms whether using unbiased or contractive compressors in unidirection or bidirection is established and an algorithm is proposed, NEOLITHIC, which almost reaches the lower bound (up to logarithm factors) under mild conditions.
• Computer Science
SIAM J. Math. Data Sci.
• 2022
A new algorithm, called DEcentralized STochastic REcurSive gradient methodS (DESTRESS) for nonconvex optimization, which matches the optimal incremental incremental oracle complexity of centralized algorithms for stationary points, while maintaining communication ef-ﬁciency.

## References

SHOWING 1-10 OF 64 REFERENCES

• Computer Science
IEEE Journal on Selected Areas in Information Theory
• 2021
Theoretical understanding is corroborated with experiments and the performance of the algorithm is compared with the state-of-the-art, showing that without sacrificing much on the accuracy, SQuARM-SGD converges at a similar rate while saving significantly in total communicated bits.
• Computer Science
ICLR
• 2020
The use of communication compression in the decentralized training context achieves linear speedup in the number of workers and supports higher compression than previous state-of-the art methods.
• Computer Science
• 2019
This paper proposes an elegant algorithmic design to employ error-compensated stochastic gradient descent for the decentralized scenario, named DeepSqueeze, and is the first time to apply the error-Compensated compression to the decentralized learning.
• Computer Science
NIPS
• 2017
Quantized SGD is proposed, a family of compression schemes for gradient updates which provides convergence guarantees and leads to significant reductions in end-to-end training time, and can be extended to stochastic variance-reduced techniques.
• Computer Science
ICML
• 2019
This work presents a novel gossip-based stochastic gradient descent algorithm, CHOCO-SGD, that converges at rate $\mathcal{O}\left(1/(nT) + 1/(T \delta^2 \omega)^2\right)$ for strongly convex objectives, where $T$ denotes the number of iterations and $\delta$ the eigengap of the connectivity matrix.
• Computer Science
NeurIPS
• 2018
This work analyzes Stochastic Gradient Descent with k-sparsification or compression (for instance top-k or random-k) and shows that this scheme converges at the same rate as vanilla SGD when equipped with error compensation.
This paper studies a D-PSGD algorithm and provides the first theoretical analysis that indicates a regime in which decentralized algorithms might outperform centralized algorithms for distributed stochastic gradient descent.
• Computer Science
NeurIPS
• 2021
It is proved that EF21 enjoys a fast O(1/T ) convergence rate for smooth nonconvex problems, beating the previous bound of O( 1/T ), which was shown under a strong bounded gradients assumption.
• Computer Science
IEEE Transactions on Automatic Control
• 2022
It is shown that C-GT inherits the advantages of gradient tracking-based algorithms and achieves linear convergence rate for strongly convex and smooth objective functions.
• Computer Science, Mathematics
SIAM J. Optim.
• 2022
Over infinite time horizon, it is established that all nodes in GT-SARAH asymptotically achieve consensus and converge to a first-order stationary point in the almost sure and mean-squared sense.