• Corpus ID: 239769317

# Faster non-convex federated learning via global and local momentum

@inproceedings{Das2020FasterNF,
title={Faster non-convex federated learning via global and local momentum},
author={Rudrajit Das and Anish Acharya and Abolfazl Hashemi and Sujay Sanghavi and I. Dhillon and Ufuk Topcu},
booktitle={Conference on Uncertainty in Artificial Intelligence},
year={2020}
}
• Published in
Conference on Uncertainty in…
7 December 2020
• Computer Science
We propose FedGLOMO , a novel federated learning (FL) algorithm with an iteration complexity of O ( (cid:15) − 1 . 5 ) to converge to an (cid:15) -stationary point (i.e., E [ (cid:107)∇ f ( x ) (cid:107) 2 ] ≤ (cid:15) ) for smooth non-convex functions – under arbitrary client heterogeneity and compressed communication – compared to the O ( (cid:15) − 2 ) complexity of most prior works. Our key algorithmic idea that enables achieving this improved complexity is based on the observation that the…
19 Citations

## Figures and Tables from this paper

• Computer Science
• 2022
A novel federated learning framework is proposed, which improves the stability of the server-side aggregation step, which is achieved by sending the clients an accelerated model estimated with the global gradient to guide the local gradient updates.
• Computer Science
ArXiv
• 2022
This paper proposes an efﬁcient adaptive algorithm based on the momentum-based variance reduced technique in cross-silo FL and proves that this algorithm is the first adaptive FL algorithm to reach the best-known samples without large batches.
• Computer Science
ArXiv
• 2022
A multiscale algorithmic framework which combines theoretical guarantees of FedAvg and FedOMD algorithms in near stationary settings with a non- stationary detection and adaptation technique to ameliorate FL generalization performance in the presence of model/concept drifts is introduced.
• Computer Science
2021 IEEE International Symposium on Information Theory (ISIT)
• 2021
Federated learning has become de facto framework for collaborative learning among edge devices with privacy concern and it is shown that it is possible to address both problems using a single strategy without any major alteration to the FL framework, or introducing additional computation and communication load.
• Computer Science
• 2022
This work proposes FedDebias, a novel uniﬁed algorithm that reduces the local learning bias on features and classi ﬁers to tackle the challenges caused by local updates in supervised FL.
• Computer Science
ArXiv
• 2022
It is shown that FedAug consistently outperforms other SOTA FL and domain generalization (DG) baselines, in which both two components (i.e., AugMean and AugCA) have individual performance gains, and demonstrates that the DG algorithms help to enhance domain robustness.
• Computer Science
UAI
• 2022
Through extensive experiments, it is shown that FedVARP outperforms state-of-the-art methods, and ClusterFedVARP achieves performance comparable to FedVarp with much less memory requirements.
This work studies the Nonconvex-Strongly-Concave (NSC) minimax optimization, and proposes a class of accelerated federated minimax optimized methods (i.e., FGDA and AdaFGDA) to solve the distributed minimax problems.
• Computer Science
ICML
• 2021
BVR-L-SGD achieves better communication complexity than both the previous non-local and local methods under mild conditions, and particularly BVR-LSGD is the first method that breaks the barrier of communication complexity Θ(1/ε) for general nonconvex smooth objectives when the heterogeneity is small and the local computation budget is large.
• Computer Science
ArXiv
• 2021
This paper proposes to have the clients send a private quantized version of only the unit vector along the change in their local parameters to the server, completely throwing away the magnitude information, and introduces QTDL, a new differentially private quantization mechanism for unit-norm vectors, which it uses in DP-NormFedAvg.

## References

SHOWING 1-10 OF 51 REFERENCES

• Computer Science
ICML
• 2020
This work obtains tight convergence rates for FedAvg and proves that it suffers from client-drift' when the data is heterogeneous (non-iid), resulting in unstable and slow convergence, and proposes a new algorithm (SCAFFOLD) which uses control variates (variance reduction) to correct for the  client-drifts' in its local updates.
• Computer Science
NeurIPS
• 2021
This work considers a class of stochastic algorithms where the WNs perform a few local updates before communication and shows that there is a trade-off curve between the number of local updates and the minibatch sizes, on which the above sample and communication complexities can be maintained.
• Computer Science
ArXiv
• 2020
A comprehensive study of how FedAvg's convergence scales with the number of participating devices in the FL setting is provided and the first linear speedup guarantees for FedAvg are established when Nesterov acceleration is used, as well as a new momentum-based FL algorithm that improves the convergence rate in overparameterized linear regression problems.
• Computer Science
AISTATS
• 2020
FedPAQ is presented, a communication-efficient Federated Learning method with Periodic Averaging and Quantization that achieves near-optimal theoretical guarantees for strongly convex and non-convex loss functions and empirically demonstrate the communication-computation tradeoff provided by the method.
• Computer Science
ICLR
• 2020
This paper analyzes the convergence of Federated Averaging on non-iid data and establishes a convergence rate of $\mathcal{O}(\frac{1}{T})$ for strongly convex and smooth problems, where $T$ is the number of SGDs.
• Computer Science
IEEE Transactions on Parallel and Distributed Systems
• 2021
This paper shows that having constant step size gradient iterations - and gossip steps between every pair of these iterations - enables convergence to within $\epsilon$ of the optimal value for a class of non-convex problems that arise in the training of deep learning models, namely, smooth non- Convex objectives satisfying Polyak-\L{}ojasiewicz condition.
• Computer Science
ICML
• 2020
This paper introduces a unified convergence analysis that covers a large variety of decentralized SGD methods which so far have required different intuitions, have different applications, and which have been developed separately in various communities.
• Computer Science
MLSys
• 2020
This work introduces a framework, FedProx, to tackle heterogeneity in federated networks, and provides convergence guarantees for this framework when learning over data from non-identical distributions (statistical heterogeneity), and while adhering to device-level systems constraints by allowing each participating device to perform a variable amount of work.
• Computer Science
ArXiv
• 2019
These results show that SGD is robust to compressed and/or delayed stochastic gradient updates and is in particular important for distributed parallel implementations, where asynchronous and communication efficient methods are the key to achieve linear speedups for optimization with multiple devices.
It is proved concise convergence rates for local SGD on convex problems and show that it converges at the same rate as mini-batch SGD in terms of number of evaluated gradients, that is, the scheme achieves linear speedup in the number of workers andmini-batch size.