# First Analysis of Local GD on Heterogeneous Data

@article{Khaled2019FirstAO, title={First Analysis of Local GD on Heterogeneous Data}, author={Ahmed Khaled and Konstantin Mishchenko and Peter Richt{\'a}rik}, journal={ArXiv}, year={2019}, volume={abs/1909.04715} }

We provide the first convergence analysis of local gradient descent for minimizing the average of smooth and convex but otherwise arbitrary functions. Problems of this form and local gradient descent as a solution method are of importance in federated learning, where each function is based on private data stored by a user on a mobile device, and the data of different users can be arbitrarily heterogeneous. We show that in a low accuracy regime, the method has the same communication complexity…

## 93 Citations

From Local SGD to Local Fixed Point Methods for Federated Learning

- Computer Science, MathematicsICML
- 2020

This work considers the generic problem of finding a fixed point of an average of operators, or an approximation thereof, in a distributed setting, and investigates two strategies to achieve such a consensus: one based on a fixed number of local steps, and the other based on randomized computations.

On the Convergence of Local Descent Methods in Federated Learning

- Computer ScienceArXiv
- 2019

The obtained convergence rates are the sharpest known to date on the convergence of local decant methods with periodic averaging for solving nonconvex federated optimization in both centralized and networked distributed optimization.

Linear Convergence in Federated Learning: Tackling Client Heterogeneity and Sparse Gradients

- Computer ScienceNeurIPS
- 2021

This work is the first to provide tight linear convergence rate guarantees, and constitutes the first comprehensive analysis of gradient sparsification in FL.

A Stochastic Newton Algorithm for Distributed Convex Optimization

- Computer ScienceNeurIPS
- 2021

It is shown that this stochastic Newton algorithm can reduce the number, and frequency, of required communication rounds compared to existing methods without hurting performance, by proving convergence guarantees for quasi-self-concordant objectives (e.g., logistic regression), alongside empirical evidence.

Gradient Descent with Compressed Iterates

- Computer ScienceArXiv
- 2019

We propose and analyze a new type of stochastic first order method: gradient descent with compressed iterates (GDCI). GDCI in each iteration first compresses the current iterate using a lossy…

On the Convergence of FedAvg on Non-IID Data

- Computer ScienceICLR
- 2020

This paper analyzes the convergence of Federated Averaging on non-iid data and establishes a convergence rate of $\mathcal{O}(\frac{1}{T})$ for strongly convex and smooth problems, where $T$ is the number of SGDs.

Second-Order Guarantees in Federated Learning

- Computer Science2020 54th Asilomar Conference on Signals, Systems, and Computers
- 2020

This work draws on recent results on the second-order optimality of stochastic gradient algorithms in centralized and decentralized settings, and establishes second- order guarantees for a class of federated learning algorithms.

Tighter Theory for Local SGD on Identical and Heterogeneous Data

- Computer ScienceAISTATS
- 2020

We provide a new analysis of local SGD, removing unnecessary assumptions and elaborating on the difference between two data regimes: identical and heterogeneous. In both cases, we improve the…

Communication-Efficient Distributed SVD via Local Power Iterations

- Computer ScienceICML
- 2021

An algorithm is developed that uniformly partition the dataset among nodes and alternate between multiple local power iterations and one global aggregation for improving the communication efficiency and it is theoretically shown that under certain assumptions, this algorithm lowers the required number of communications.

Iterated Vector Fields and Conservatism, with Applications to Federated Learning

- Mathematics, Computer ScienceALT
- 2022

It is shown that for certain classes of functions, federated averaging is equivalent to gradient descent on a surrogate loss function, and is related to optimization and derive novel convergence results for federated learning algorithms.

## References

SHOWING 1-10 OF 21 REFERENCES

A Delay-tolerant Proximal-Gradient Algorithm for Distributed Learning

- Computer ScienceICML
- 2018

This work proposes and analyzes a flexible asynchronous optimization algorithm for solving nonsmooth learning problems and proves that the algorithm converges linearly with a fixed learning rate that does not depend on communication delays nor on the number of machines.

Adaptive Federated Learning in Resource Constrained Edge Computing Systems

- Computer ScienceIEEE Journal on Selected Areas in Communications
- 2019

This paper analyzes the convergence bound of distributed gradient descent from a theoretical point of view, and proposes a control algorithm that determines the best tradeoff between local update and global parameter aggregation to minimize the loss function under a given resource budget.

On the Convergence of FedAvg on Non-IID Data

- Computer ScienceICLR
- 2020

This paper analyzes the convergence of Federated Averaging on non-iid data and establishes a convergence rate of $\mathcal{O}(\frac{1}{T})$ for strongly convex and smooth problems, where $T$ is the number of SGDs.

Local SGD Converges Fast and Communicates Little

- Computer ScienceICLR
- 2019

It is proved concise convergence rates for local SGD on convex problems and show that it converges at the same rate as mini-batch SGD in terms of number of evaluated gradients, that is, the scheme achieves linear speedup in the number of workers andmini-batch size.

When Edge Meets Learning: Adaptive Control for Resource-Constrained Distributed Machine Learning

- Computer ScienceIEEE INFOCOM 2018 - IEEE Conference on Computer Communications
- 2018

This paper analyzes the convergence rate of distributed gradient descent from a theoretical point of view, and proposes a control algorithm that determines the best trade-off between local update and global parameter aggregation to minimize the loss function under a given resource budget.

On the convergence properties of a K-step averaging stochastic gradient descent algorithm for nonconvex optimization

- Computer ScienceIJCAI
- 2018

A synchronous K-step averaging stochastic gradient descent algorithm which is called K-AVG is adopted and analyzed for solving large scale machine learning problems and achieves better accuracies and faster convergence for training with the CIFAR-10 dataset.

Federated Learning: Strategies for Improving Communication Efficiency

- Computer ScienceArXiv
- 2016

Two ways to reduce the uplink communication costs are proposed: structured updates, where the user directly learns an update from a restricted space parametrized using a smaller number of variables, e.g. either low-rank or a random mask; and sketched updates, which learn a full model update and then compress it using a combination of quantization, random rotations, and subsampling.

Communication-Efficient Learning of Deep Networks from Decentralized Data

- Computer ScienceAISTATS
- 2017

This work presents a practical method for the federated learning of deep networks based on iterative model averaging, and conducts an extensive empirical evaluation, considering five different model architectures and four datasets.

Revisiting Stochastic Extragradient

- Computer ScienceAISTATS
- 2020

This work fixes a fundamental issue in the stochastic extragradient method by providing a new sampling strategy that is motivated by approximating implicit updates, and proves guarantees for solving variational inequality that go beyond existing settings.

On the Convergence of Federated Optimization in Heterogeneous Networks

- Computer ScienceArXiv
- 2018

This work proposes and introduces \fedprox, which is similar in spirit to \fedavg, but more amenable to theoretical analysis, and describes the convergence of \fed Prox under a novel \textit{device similarity} assumption.