• Corpus ID: 236965883

FedPAGE: A Fast Local Stochastic Gradient Method for Communication-Efficient Federated Learning

@article{Zhao2021FedPAGEAF,
  title={FedPAGE: A Fast Local Stochastic Gradient Method for Communication-Efficient Federated Learning},
  author={Haoyu Zhao and Zhize Li and Peter Richt{\'a}rik},
  journal={ArXiv},
  year={2021},
  volume={abs/2108.04755}
}
Federated Averaging (FedAvg, also known as Local-SGD) (McMahan et al., 2017) is a classical federated learning algorithm in which clients run multiple local SGD steps before communicating their update to an orchestrating server. We propose a new federated learning algorithm, FedPAGE, able to further reduce the communication complexity by utilizing the recent optimal PAGE method (Li et al., 2021) instead of plain SGD in FedAvg. We show that FedPAGE uses much fewer communication rounds than… 

Figures and Tables from this paper

Federated Learning Aggregation: New Robust Algorithms with Guarantees

TLDR
A complete general mathematical convergence analysis is carried out to evaluate aggregation strategies in a federated learning framework and derive novel aggregation algorithms which are able to modify their model architecture by differentiating client contributions according to the value of their losses.

Accelerating Federated Learning via Sampling Anchor Clients with Large Batches

TLDR
A unified framework FedAMD is proposed, which disjoints the participants into anchor and miner groups based on time-varying probabilities, and achieves a convergence rate of O (1 /(cid:15) ) under non-convex objectives by sampling an anchor with a constant probability.

Faster Rates for Compressed Federated Learning with Client-Variance Reduction

TLDR
Both COFIG and FRECON do not need to communicate with all the clients and provide first/faster convergence results for convex and nonconvex federated learning, while previous works either require full clients communication or obtain worse convergence results.

SoteriaFL: A Unified Framework for Private Federated Learning with Communication Compression

TLDR
A framework SoteriaFL is proposed, which accommodates a general family of local gradient estimators including popular stochastic variance-reduced gradient methods and the state-of-the-art shifted compression scheme, and is shown to achieve better communication complexity without sacrificing privacy nor utility than other private federated learning algorithms without communication compression.

BEER: Fast O(1/T) Rate for Decentralized Nonconvex Optimization with Communication Compression

TLDR
This paper proposes BEER, which adopts communication compression with gradient tracking, and shows it converges at a faster rate of O (1 /T ) than the state-of-the-art rate, by matching the rate without compression even under arbitrary data heterogeneity.

Deep Leakage from Model in Federated Learning

TLDR
Two novel frameworks are presented to demonstrate that transmitting model weights is also likely to leak private local data of clients, i.e., (DLM and DLM+), under the FL scenario.

CANITA: Faster Rates for Distributed Convex Optimization with Communication Compression

TLDR
The results show that as long as the number of devices n is large, or the compression ω is not very high, CANITA achieves the faster convergence rate, which improves upon the state-of-the-art non-accelerated rate.

Personalized Federated Learning with Multiple Known Clusters

TLDR
This work develops an algorithm that allows each cluster to communicate independently and derive the convergence results, and studies a hierarchical linear model to theoretically demonstrate that this approach outperforms agents learning independently and agents learning a single shared weight.

FedControl: When Control Theory Meets Federated Learning

TLDR
This work differentiates client contributions according to the performance of local learning and its evo-lution in federated learning algorithms by differentiating coordinate-wise averaging of the model parameters.

EF21 with Bells & Whistles: Practical Algorithmic Extensions of Modern Error Feedback

TLDR
Six practical extensions of EF21 are proposed, all supported by strong convergence theory: partial participation, stochastic approximation, variance reduction, proximal setting, momentum and bidirectional compression.

References

SHOWING 1-10 OF 44 REFERENCES

SCAFFOLD: Stochastic Controlled Averaging for Federated Learning

TLDR
This work obtains tight convergence rates for FedAvg and proves that it suffers from `client-drift' when the data is heterogeneous (non-iid), resulting in unstable and slow convergence, and proposes a new algorithm (SCAFFOLD) which uses control variates (variance reduction) to correct for the ` client-drifts' in its local updates.

On the Convergence of Local Descent Methods in Federated Learning

TLDR
The obtained convergence rates are the sharpest known to date on the convergence of local decant methods with periodic averaging for solving nonconvex federated optimization in both centralized and networked distributed optimization.

Agnostic Federated Learning

TLDR
This work proposes a new framework of agnostic federated learning, where the centralized model is optimized for any target distribution formed by a mixture of the client distributions, and shows that this framework naturally yields a notion of fairness.

On the Convergence of Federated Optimization in Heterogeneous Networks

TLDR
This work proposes and introduces \fedprox, which is similar in spirit to \fedavg, but more amenable to theoretical analysis, and describes the convergence of \fed Prox under a novel \textit{device similarity} assumption.

Federated Learning: Strategies for Improving Communication Efficiency

TLDR
Two ways to reduce the uplink communication costs are proposed: structured updates, where the user directly learns an update from a restricted space parametrized using a smaller number of variables, e.g. either low-rank or a random mask; and sketched updates, which learn a full model update and then compress it using a combination of quantization, random rotations, and subsampling.

Federated Optimization: Distributed Machine Learning for On-Device Intelligence

We introduce a new and increasingly relevant setting for distributed optimization in machine learning, where the data defining the optimization are unevenly distributed over an extremely large number

Acceleration for Compressed Gradient Descent in Distributed and Federated Optimization

TLDR
This paper proposes the first accelerated compressed gradient descent (ACGD) methods and improves upon the existing non-accelerated rates and recovers the optimal rates of accelerated gradient descent as a special case when no compression is applied.

CANITA: Faster Rates for Distributed Convex Optimization with Communication Compression

TLDR
The results show that as long as the number of devices n is large, or the compression ω is not very high, CANITA achieves the faster convergence rate, which improves upon the state-of-the-art non-accelerated rate.

Local SGD Converges Fast and Communicates Little

TLDR
It is proved concise convergence rates for local SGD on convex problems and show that it converges at the same rate as mini-batch SGD in terms of number of evaluated gradients, that is, the scheme achieves linear speedup in the number of workers andmini-batch size.

Advances and Open Problems in Federated Learning

TLDR
Motivated by the explosive growth in FL research, this paper discusses recent advances and presents an extensive collection of open problems and challenges.