• Corpus ID: 204575663

# SCAFFOLD: Stochastic Controlled Averaging for On-Device Federated Learning

@article{Karimireddy2019SCAFFOLDSC,
title={SCAFFOLD: Stochastic Controlled Averaging for On-Device Federated Learning},
author={Sai Praneeth Karimireddy and Satyen Kale and Mehryar Mohri and Sashank J. Reddi and Sebastian U. Stich and Ananda Theertha Suresh},
journal={ArXiv},
year={2019},
volume={abs/1910.06378}
}
• Published 14 October 2019
• Computer Science
• ArXiv
Federated learning is a key scenario in modern large-scale machine learning. In that scenario, the training data remains distributed over a large number of clients, which may be phones, other mobile devices, or network sensors and a centralized model is learned without ever transmitting client data over the network. The standard optimization algorithm used in this scenario is Federated Averaging (FedAvg). However, when client data is heterogeneous, which is typical in applications, FedAvg does…
151 Citations
Breaking the centralized barrier for cross-device federated learning
• Computer Science
NeurIPS
• 2021
This work proposes a general algorithmic framework, MIME, which mitigates client drift and adapts an arbitrary centralized optimization algorithm such as momentum and Adam to the cross-device federated learning setting and proves that MIME is provably faster than any centralized method.
Federated Learning Based on Dynamic Regularization
• Computer Science
ICLR
• 2021
This work proposes a novel federated learning method for distributively training neural network models, where the server orchestrates cooperation between a subset of randomly chosen devices in each round, using a dynamic regularizer for each device at each round.
Federated Learning under Arbitrary Communication Patterns
• Computer Science
ICML
• 2021
This paper investigates the performance of an asynchronous version of local SGD wherein the clients can communicate with the server at arbitrary time intervals and achieves convergence rates that match the synchronous version that requires all clients to communicate simultaneously.
On the Convergence of Local Descent Methods in Federated Learning
• Computer Science
ArXiv
• 2019
The obtained convergence rates are the sharpest known to date on the convergence of local decant methods with periodic averaging for solving nonconvex federated optimization in both centralized and networked distributed optimization.
Communication-Efficient Federated Learning with Acceleration of Global Momentum
• Computer Science
• 2022
A novel federated learning framework is proposed, which improves the stability of the server-side aggregation step, which is achieved by sending the clients an accelerated model estimated with the global gradient to guide the local gradient updates.
Local Adaptivity in Federated Learning: Convergence and Consistency
• Computer Science
ArXiv
• 2021
It is shown in both theory and practice that while local adaptive methods can accelerate convergence, they can cause a non-vanishing solution bias, where the final converged solution may be different from the stationary point of the global objective function.
FedDANE: A Federated Newton-Type Method
• Computer Science
2019 53rd Asilomar Conference on Signals, Systems, and Computers
• 2019
This work proposes FedDANE, an optimization method that is adapted from DANE, a method for classical distributed optimization, to handle the practical constraints of federated learning, and provides convergence guarantees for this method when learning over both convex and non-convex functions.
Server Averaging for Federated Learning
• Computer Science
ArXiv
• 2021
This work proposes the server averaging algorithm, which not only converges faster, to a target accuracy, than federated averaging (FedAvg), but also reduces the computation costs on the client-level through epoch decay.
MIME: MIMICKING CENTRALIZED STOCHASTIC AL-
• Computer Science
• 2020
This work proposes a general algorithmic framework, MIME, which mitigates client drift and adapts arbitrary centralized optimization algorithms such as SGD and Adam to the federated learning setting.
Subspace Learning for Personalized Federated Optimization
• Computer Science
ArXiv
• 2021
This work proposes a method to address the situation through the lens of ensemble learning based on the construction of a low-loss subspace continuum that generates a high-accuracy ensemble of two endpoints (i.e. global model and local model).

## References

SHOWING 1-10 OF 45 REFERENCES
Federated Learning: Strategies for Improving Communication Efficiency
• Computer Science
ArXiv
• 2016
Two ways to reduce the uplink communication costs are proposed: structured updates, where the user directly learns an update from a restricted space parametrized using a smaller number of variables, e.g. either low-rank or a random mask; and sketched updates, which learn a full model update and then compress it using a combination of quantization, random rotations, and subsampling.
Agnostic Federated Learning
• Computer Science
ICML
• 2019
This work proposes a new framework of agnostic federated learning, where the centralized model is optimized for any target distribution formed by a mixture of the client distributions, and shows that this framework naturally yields a notion of fairness.
Federated Optimization: Distributed Machine Learning for On-Device Intelligence
• Computer Science
ArXiv
• 2016
We introduce a new and increasingly relevant setting for distributed optimization in machine learning, where the data defining the optimization are unevenly distributed over an extremely large number
Adaptive Federated Learning in Resource Constrained Edge Computing Systems
• Computer Science
IEEE Journal on Selected Areas in Communications
• 2019
This paper analyzes the convergence bound of distributed gradient descent from a theoretical point of view, and proposes a control algorithm that determines the best tradeoff between local update and global parameter aggregation to minimize the loss function under a given resource budget.
On the Convergence of FedAvg on Non-IID Data
• Computer Science
ICLR
• 2020
This paper analyzes the convergence of Federated Averaging on non-iid data and establishes a convergence rate of $\mathcal{O}(\frac{1}{T})$ for strongly convex and smooth problems, where $T$ is the number of SGDs.
Federated Learning with Non-IID Data
• Computer Science
ArXiv
• 2018
This work presents a strategy to improve training on non-IID data by creating a small subset of data which is globally shared between all the edge devices, and shows that accuracy can be increased by 30% for the CIFAR-10 dataset with only 5% globally shared data.
Communication-Efficient Learning of Deep Networks from Decentralized Data
• Computer Science
AISTATS
• 2017
This work presents a practical method for the federated learning of deep networks based on iterative model averaging, and conducts an extensive empirical evaluation, considering five different model architectures and four datasets.
Federated Learning: Challenges, Methods, and Future Directions
• Computer Science
IEEE Signal Processing Magazine
• 2020
The unique characteristics and challenges of federated learning are discussed, a broad overview of current approaches are provided, and several directions of future work that are relevant to a wide range of research communities are outlined.
Communication trade-offs for synchronized distributed SGD with large step size
• Computer Science
NeurIPS 2019
• 2019
A non-asymptotic error analysis is proposed, which enables comparison to one-shot averaging, and it is shown that local-SGD reduces communication by a factor of $O\Big(\sqrt{T}}{P^{3/2}}\Big)$, with $T$ the total number of gradients and $P$ machines.
CoCoA: A General Framework for Communication-Efficient Distributed Optimization
• Computer Science
J. Mach. Learn. Res.
• 2017
This work presents a general-purpose framework for distributed computing environments, CoCoA, that has an efficient communication scheme and is applicable to a wide variety of problems in machine learning and signal processing, and extends the framework to cover general non-strongly-convex regularizers, including L1-regularized problems like lasso.