• Corpus ID: 202558609

First Analysis of Local GD on Heterogeneous Data

@article{Khaled2019FirstAO,
  title={First Analysis of Local GD on Heterogeneous Data},
  author={Ahmed Khaled and Konstantin Mishchenko and Peter Richt{\'a}rik},
  journal={ArXiv},
  year={2019},
  volume={abs/1909.04715}
}
We provide the first convergence analysis of local gradient descent for minimizing the average of smooth and convex but otherwise arbitrary functions. Problems of this form and local gradient descent as a solution method are of importance in federated learning, where each function is based on private data stored by a user on a mobile device, and the data of different users can be arbitrarily heterogeneous. We show that in a low accuracy regime, the method has the same communication complexity… 

Figures from this paper

From Local SGD to Local Fixed Point Methods for Federated Learning
TLDR
This work considers the generic problem of finding a fixed point of an average of operators, or an approximation thereof, in a distributed setting, and investigates two strategies to achieve such a consensus: one based on a fixed number of local steps, and the other based on randomized computations.
On the Convergence of Local Descent Methods in Federated Learning
TLDR
The obtained convergence rates are the sharpest known to date on the convergence of local decant methods with periodic averaging for solving nonconvex federated optimization in both centralized and networked distributed optimization.
Linear Convergence in Federated Learning: Tackling Client Heterogeneity and Sparse Gradients
TLDR
This work is the first to provide tight linear convergence rate guarantees, and constitutes the first comprehensive analysis of gradient sparsification in FL.
A Stochastic Newton Algorithm for Distributed Convex Optimization
TLDR
It is shown that this stochastic Newton algorithm can reduce the number, and frequency, of required communication rounds compared to existing methods without hurting performance, by proving convergence guarantees for quasi-self-concordant objectives (e.g., logistic regression), alongside empirical evidence.
Gradient Descent with Compressed Iterates
We propose and analyze a new type of stochastic first order method: gradient descent with compressed iterates (GDCI). GDCI in each iteration first compresses the current iterate using a lossy
On the Convergence of FedAvg on Non-IID Data
TLDR
This paper analyzes the convergence of Federated Averaging on non-iid data and establishes a convergence rate of $\mathcal{O}(\frac{1}{T})$ for strongly convex and smooth problems, where $T$ is the number of SGDs.
Second-Order Guarantees in Federated Learning
TLDR
This work draws on recent results on the second-order optimality of stochastic gradient algorithms in centralized and decentralized settings, and establishes second- order guarantees for a class of federated learning algorithms.
Tighter Theory for Local SGD on Identical and Heterogeneous Data
We provide a new analysis of local SGD, removing unnecessary assumptions and elaborating on the difference between two data regimes: identical and heterogeneous. In both cases, we improve the
Communication-Efficient Distributed SVD via Local Power Iterations
TLDR
An algorithm is developed that uniformly partition the dataset among nodes and alternate between multiple local power iterations and one global aggregation for improving the communication efficiency and it is theoretically shown that under certain assumptions, this algorithm lowers the required number of communications.
Iterated Vector Fields and Conservatism, with Applications to Federated Learning
TLDR
It is shown that for certain classes of functions, federated averaging is equivalent to gradient descent on a surrogate loss function, and is related to optimization and derive novel convergence results for federated learning algorithms.
...
...

References

SHOWING 1-10 OF 21 REFERENCES
A Delay-tolerant Proximal-Gradient Algorithm for Distributed Learning
TLDR
This work proposes and analyzes a flexible asynchronous optimization algorithm for solving nonsmooth learning problems and proves that the algorithm converges linearly with a fixed learning rate that does not depend on communication delays nor on the number of machines.
Adaptive Federated Learning in Resource Constrained Edge Computing Systems
TLDR
This paper analyzes the convergence bound of distributed gradient descent from a theoretical point of view, and proposes a control algorithm that determines the best tradeoff between local update and global parameter aggregation to minimize the loss function under a given resource budget.
On the Convergence of FedAvg on Non-IID Data
TLDR
This paper analyzes the convergence of Federated Averaging on non-iid data and establishes a convergence rate of $\mathcal{O}(\frac{1}{T})$ for strongly convex and smooth problems, where $T$ is the number of SGDs.
Local SGD Converges Fast and Communicates Little
TLDR
It is proved concise convergence rates for local SGD on convex problems and show that it converges at the same rate as mini-batch SGD in terms of number of evaluated gradients, that is, the scheme achieves linear speedup in the number of workers andmini-batch size.
When Edge Meets Learning: Adaptive Control for Resource-Constrained Distributed Machine Learning
TLDR
This paper analyzes the convergence rate of distributed gradient descent from a theoretical point of view, and proposes a control algorithm that determines the best trade-off between local update and global parameter aggregation to minimize the loss function under a given resource budget.
On the convergence properties of a K-step averaging stochastic gradient descent algorithm for nonconvex optimization
TLDR
A synchronous K-step averaging stochastic gradient descent algorithm which is called K-AVG is adopted and analyzed for solving large scale machine learning problems and achieves better accuracies and faster convergence for training with the CIFAR-10 dataset.
Federated Learning: Strategies for Improving Communication Efficiency
TLDR
Two ways to reduce the uplink communication costs are proposed: structured updates, where the user directly learns an update from a restricted space parametrized using a smaller number of variables, e.g. either low-rank or a random mask; and sketched updates, which learn a full model update and then compress it using a combination of quantization, random rotations, and subsampling.
Communication-Efficient Learning of Deep Networks from Decentralized Data
TLDR
This work presents a practical method for the federated learning of deep networks based on iterative model averaging, and conducts an extensive empirical evaluation, considering five different model architectures and four datasets.
Revisiting Stochastic Extragradient
TLDR
This work fixes a fundamental issue in the stochastic extragradient method by providing a new sampling strategy that is motivated by approximating implicit updates, and proves guarantees for solving variational inequality that go beyond existing settings.
On the Convergence of Federated Optimization in Heterogeneous Networks
TLDR
This work proposes and introduces \fedprox, which is similar in spirit to \fedavg, but more amenable to theoretical analysis, and describes the convergence of \fed Prox under a novel \textit{device similarity} assumption.
...
...