• Corpus ID: 239769317

Faster non-convex federated learning via global and local momentum

@inproceedings{Das2020FasterNF,
  title={Faster non-convex federated learning via global and local momentum},
  author={Rudrajit Das and Anish Acharya and Abolfazl Hashemi and Sujay Sanghavi and I. Dhillon and Ufuk Topcu},
  booktitle={Conference on Uncertainty in Artificial Intelligence},
  year={2020}
}
We propose FedGLOMO , a novel federated learning (FL) algorithm with an iteration complexity of O ( (cid:15) − 1 . 5 ) to converge to an (cid:15) -stationary point (i.e., E [ (cid:107)∇ f ( x ) (cid:107) 2 ] ≤ (cid:15) ) for smooth non-convex functions – under arbitrary client heterogeneity and compressed communication – compared to the O ( (cid:15) − 2 ) complexity of most prior works. Our key algorithmic idea that enables achieving this improved complexity is based on the observation that the… 

Figures and Tables from this paper

Communication-Efficient Federated Learning with Acceleration of Global Momentum

A novel federated learning framework is proposed, which improves the stability of the server-side aggregation step, which is achieved by sending the clients an accelerated model estimated with the global gradient to guide the local gradient updates.

Faster Adaptive Federated Learning

This paper proposes an efficient adaptive algorithm based on the momentum-based variance reduced technique in cross-silo FL and proves that this algorithm is the first adaptive FL algorithm to reach the best-known samples without large batches.

Online Federated Learning via Non-Stationary Detection and Adaptation amidst Concept Drift

A multiscale algorithmic framework which combines theoretical guarantees of FedAvg and FedOMD algorithms in near stationary settings with a non- stationary detection and adaptation technique to ameliorate FL generalization performance in the presence of model/concept drifts is introduced.

FedADC: Accelerated Federated Learning with Drift Control

Federated learning has become de facto framework for collaborative learning among edge devices with privacy concern and it is shown that it is possible to address both problems using a single strategy without any major alteration to the FL framework, or introducing additional computation and communication load.

FedDebias: Reducing the Local Learning Bias Improves Federated Learning on Heterogeneous Data

This work proposes FedDebias, a novel unified algorithm that reduces the local learning bias on features and classi fiers to tackle the challenges caused by local updates in supervised FL.

FedAug: Reducing the Local Learning Bias Improves Federated Learning on Heterogeneous Data

It is shown that FedAug consistently outperforms other SOTA FL and domain generalization (DG) baselines, in which both two components (i.e., AugMean and AugCA) have individual performance gains, and demonstrates that the DG algorithms help to enhance domain robustness.

FedVARP: Tackling the Variance Due to Partial Client Participation in Federated Learning

Through extensive experiments, it is shown that FedVARP outperforms state-of-the-art methods, and ClusterFedVARP achieves performance comparable to FedVarp with much less memory requirements.

Adaptive Federated Minimax Optimization with Lower complexities

This work studies the Nonconvex-Strongly-Concave (NSC) minimax optimization, and proposes a class of accelerated federated minimax optimized methods (i.e., FGDA and AdaFGDA) to solve the distributed minimax problems.

Bias-Variance Reduced Local SGD for Less Heterogeneous Federated Learning

BVR-L-SGD achieves better communication complexity than both the previous non-local and local methods under mild conditions, and particularly BVR-LSGD is the first method that breaks the barrier of communication complexity Θ(1/ε) for general nonconvex smooth objectives when the heterogeneity is small and the local computation budget is large.

DP-NormFedAvg: Normalizing Client Updates for Privacy-Preserving Federated Learning

This paper proposes to have the clients send a private quantized version of only the unit vector along the change in their local parameters to the server, completely throwing away the magnitude information, and introduces QTDL, a new differentially private quantization mechanism for unit-norm vectors, which it uses in DP-NormFedAvg.

References

SHOWING 1-10 OF 51 REFERENCES

SCAFFOLD: Stochastic Controlled Averaging for Federated Learning

This work obtains tight convergence rates for FedAvg and proves that it suffers from `client-drift' when the data is heterogeneous (non-iid), resulting in unstable and slow convergence, and proposes a new algorithm (SCAFFOLD) which uses control variates (variance reduction) to correct for the ` client-drifts' in its local updates.

STEM: A Stochastic Two-Sided Momentum Algorithm Achieving Near-Optimal Sample and Communication Complexities for Federated Learning

This work considers a class of stochastic algorithms where the WNs perform a few local updates before communication and shows that there is a trade-off curve between the number of local updates and the minibatch sizes, on which the above sample and communication complexities can be maintained.

Federated Learning's Blessing: FedAvg has Linear Speedup

A comprehensive study of how FedAvg's convergence scales with the number of participating devices in the FL setting is provided and the first linear speedup guarantees for FedAvg are established when Nesterov acceleration is used, as well as a new momentum-based FL algorithm that improves the convergence rate in overparameterized linear regression problems.

FedPAQ: A Communication-Efficient Federated Learning Method with Periodic Averaging and Quantization

FedPAQ is presented, a communication-efficient Federated Learning method with Periodic Averaging and Quantization that achieves near-optimal theoretical guarantees for strongly convex and non-convex loss functions and empirically demonstrate the communication-computation tradeoff provided by the method.

On the Convergence of FedAvg on Non-IID Data

This paper analyzes the convergence of Federated Averaging on non-iid data and establishes a convergence rate of $\mathcal{O}(\frac{1}{T})$ for strongly convex and smooth problems, where $T$ is the number of SGDs.

On the Benefits of Multiple Gossip Steps in Communication-Constrained Decentralized Federated Learning

This paper shows that having constant step size gradient iterations - and gossip steps between every pair of these iterations - enables convergence to within $\epsilon$ of the optimal value for a class of non-convex problems that arise in the training of deep learning models, namely, smooth non- Convex objectives satisfying Polyak-\L{}ojasiewicz condition.

A Unified Theory of Decentralized SGD with Changing Topology and Local Updates

This paper introduces a unified convergence analysis that covers a large variety of decentralized SGD methods which so far have required different intuitions, have different applications, and which have been developed separately in various communities.

Federated Optimization in Heterogeneous Networks

This work introduces a framework, FedProx, to tackle heterogeneity in federated networks, and provides convergence guarantees for this framework when learning over data from non-identical distributions (statistical heterogeneity), and while adhering to device-level systems constraints by allowing each participating device to perform a variable amount of work.

The Error-Feedback Framework: Better Rates for SGD with Delayed Gradients and Compressed Communication

These results show that SGD is robust to compressed and/or delayed stochastic gradient updates and is in particular important for distributed parallel implementations, where asynchronous and communication efficient methods are the key to achieve linear speedups for optimization with multiple devices.

Local SGD Converges Fast and Communicates Little

It is proved concise convergence rates for local SGD on convex problems and show that it converges at the same rate as mini-batch SGD in terms of number of evaluated gradients, that is, the scheme achieves linear speedup in the number of workers andmini-batch size.
...