• Corpus ID: 202558609

# First Analysis of Local GD on Heterogeneous Data

@article{Khaled2019FirstAO,
title={First Analysis of Local GD on Heterogeneous Data},
author={Ahmed Khaled and Konstantin Mishchenko and Peter Richt{\'a}rik},
journal={ArXiv},
year={2019},
volume={abs/1909.04715}
}
• Published 10 September 2019
• Computer Science
• ArXiv
We provide the first convergence analysis of local gradient descent for minimizing the average of smooth and convex but otherwise arbitrary functions. Problems of this form and local gradient descent as a solution method are of importance in federated learning, where each function is based on private data stored by a user on a mobile device, and the data of different users can be arbitrarily heterogeneous. We show that in a low accuracy regime, the method has the same communication complexity…

## Figures from this paper

From Local SGD to Local Fixed Point Methods for Federated Learning
• Computer Science, Mathematics
ICML
• 2020
This work considers the generic problem of finding a fixed point of an average of operators, or an approximation thereof, in a distributed setting, and investigates two strategies to achieve such a consensus: one based on a fixed number of local steps, and the other based on randomized computations.
On the Convergence of Local Descent Methods in Federated Learning
• Computer Science
ArXiv
• 2019
The obtained convergence rates are the sharpest known to date on the convergence of local decant methods with periodic averaging for solving nonconvex federated optimization in both centralized and networked distributed optimization.
Linear Convergence in Federated Learning: Tackling Client Heterogeneity and Sparse Gradients
• Computer Science
NeurIPS
• 2021
This work is the first to provide tight linear convergence rate guarantees, and constitutes the first comprehensive analysis of gradient sparsification in FL.
A Stochastic Newton Algorithm for Distributed Convex Optimization
• Computer Science
NeurIPS
• 2021
It is shown that this stochastic Newton algorithm can reduce the number, and frequency, of required communication rounds compared to existing methods without hurting performance, by proving convergence guarantees for quasi-self-concordant objectives (e.g., logistic regression), alongside empirical evidence.
• Computer Science
ArXiv
• 2019
We propose and analyze a new type of stochastic first order method: gradient descent with compressed iterates (GDCI). GDCI in each iteration first compresses the current iterate using a lossy
On the Convergence of FedAvg on Non-IID Data
• Computer Science
ICLR
• 2020
This paper analyzes the convergence of Federated Averaging on non-iid data and establishes a convergence rate of $\mathcal{O}(\frac{1}{T})$ for strongly convex and smooth problems, where $T$ is the number of SGDs.
Second-Order Guarantees in Federated Learning
• Computer Science
2020 54th Asilomar Conference on Signals, Systems, and Computers
• 2020
This work draws on recent results on the second-order optimality of stochastic gradient algorithms in centralized and decentralized settings, and establishes second- order guarantees for a class of federated learning algorithms.
Tighter Theory for Local SGD on Identical and Heterogeneous Data
• Computer Science
AISTATS
• 2020
We provide a new analysis of local SGD, removing unnecessary assumptions and elaborating on the difference between two data regimes: identical and heterogeneous. In both cases, we improve the
Communication-Efficient Distributed SVD via Local Power Iterations
• Computer Science
ICML
• 2021
An algorithm is developed that uniformly partition the dataset among nodes and alternate between multiple local power iterations and one global aggregation for improving the communication efficiency and it is theoretically shown that under certain assumptions, this algorithm lowers the required number of communications.
Iterated Vector Fields and Conservatism, with Applications to Federated Learning
• Mathematics, Computer Science
ALT
• 2022
It is shown that for certain classes of functions, federated averaging is equivalent to gradient descent on a surrogate loss function, and is related to optimization and derive novel convergence results for federated learning algorithms.

## References

SHOWING 1-10 OF 21 REFERENCES
A Delay-tolerant Proximal-Gradient Algorithm for Distributed Learning
• Computer Science
ICML
• 2018
This work proposes and analyzes a flexible asynchronous optimization algorithm for solving nonsmooth learning problems and proves that the algorithm converges linearly with a fixed learning rate that does not depend on communication delays nor on the number of machines.
Adaptive Federated Learning in Resource Constrained Edge Computing Systems
• Computer Science
IEEE Journal on Selected Areas in Communications
• 2019
This paper analyzes the convergence bound of distributed gradient descent from a theoretical point of view, and proposes a control algorithm that determines the best tradeoff between local update and global parameter aggregation to minimize the loss function under a given resource budget.
On the Convergence of FedAvg on Non-IID Data
• Computer Science
ICLR
• 2020
This paper analyzes the convergence of Federated Averaging on non-iid data and establishes a convergence rate of $\mathcal{O}(\frac{1}{T})$ for strongly convex and smooth problems, where $T$ is the number of SGDs.
Local SGD Converges Fast and Communicates Little
It is proved concise convergence rates for local SGD on convex problems and show that it converges at the same rate as mini-batch SGD in terms of number of evaluated gradients, that is, the scheme achieves linear speedup in the number of workers andmini-batch size.
When Edge Meets Learning: Adaptive Control for Resource-Constrained Distributed Machine Learning
• Computer Science
IEEE INFOCOM 2018 - IEEE Conference on Computer Communications
• 2018
This paper analyzes the convergence rate of distributed gradient descent from a theoretical point of view, and proposes a control algorithm that determines the best trade-off between local update and global parameter aggregation to minimize the loss function under a given resource budget.
On the convergence properties of a K-step averaging stochastic gradient descent algorithm for nonconvex optimization
• Computer Science
IJCAI
• 2018
A synchronous K-step averaging stochastic gradient descent algorithm which is called K-AVG is adopted and analyzed for solving large scale machine learning problems and achieves better accuracies and faster convergence for training with the CIFAR-10 dataset.
Federated Learning: Strategies for Improving Communication Efficiency
• Computer Science
ArXiv
• 2016
Two ways to reduce the uplink communication costs are proposed: structured updates, where the user directly learns an update from a restricted space parametrized using a smaller number of variables, e.g. either low-rank or a random mask; and sketched updates, which learn a full model update and then compress it using a combination of quantization, random rotations, and subsampling.
Communication-Efficient Learning of Deep Networks from Decentralized Data
• Computer Science
AISTATS
• 2017
This work presents a practical method for the federated learning of deep networks based on iterative model averaging, and conducts an extensive empirical evaluation, considering five different model architectures and four datasets.