On the Unreasonable Effectiveness of Federated Averaging with Heterogeneous Data

  title={On the Unreasonable Effectiveness of Federated Averaging with Heterogeneous Data},
  author={Jianyu Wang and Rudrajit Das and Gauri Joshi and Satyen Kale and Zheng Xu and Tong Zhang},
Existing theory predicts that data heterogeneity will degrade the performance of the Federated Averaging (FedAvg) algorithm in federated learning. However, in practice, the simple FedAvg algorithm converges very well. This paper explains the seemingly unreasonable effectiveness of FedAvg that contradicts the previous theoretical predictions. We find that the key assumption of bounded gradient dissimilarity in previous theoretical analyses is too pessimistic to characterize data heterogeneity in… 

Figures and Tables from this paper

TCT: Convexifying Federated Learning using Bootstrapped Neural Tangent Kernels

It is shown that the early layers of the network do learn useful features, but theAl layers fail to make use of them, and federated optimization applied to this non-convex problem distorts the learning of the al layers.

Learning to Generate Image Embeddings with User-level Differential Privacy

DP-FedEmb, a variant of federated learning algorithms with per-user sensitivity control and noise addition, is proposed to train image embedding models using supervised training data centralized in the datacenter to achieve user-level DP for large image-to-embedding feature extractors.



On the Convergence of Local Descent Methods in Federated Learning

The obtained convergence rates are the sharpest known to date on the convergence of local decant methods with periodic averaging for solving nonconvex federated optimization in both centralized and networked distributed optimization.

Federated Optimization in Heterogeneous Networks

This work introduces a framework, FedProx, to tackle heterogeneity in federated networks, and provides convergence guarantees for this framework when learning over data from non-identical distributions (statistical heterogeneity), and while adhering to device-level systems constraints by allowing each participating device to perform a variable amount of work.

Sharp Bounds for Federated Averaging (Local SGD) and Continuous Perspective

A lower bound is provided for FedAvg that matches the existing upper bound, which shows the existing FedAvg upper bound analysis is not improvable, and novel sharp bounds are established that establish a lower bound in a heterogeneous setting that nearly matches theexisting upper bound.

Achieving Linear Speedup with Partial Worker Participation in Non-IID Federated Learning

It is shown that the federated averaging (FedAvg) algorithm (with two-sided learning rates) on non-i.i.d. datasets with partial worker participation in FL achieves a convergence rate O( 1 √ mKT + 1 T ) for full worker participation and O(1 √ nKT - 1 T) for partial workerparticipation, which reveals that the local steps in FL could help the convergence.

Adaptive Federated Optimization

This work proposes federated versions of adaptive optimizers, including Adagrad, Adam, and Yogi, and analyzes their convergence in the presence of heterogeneous data for general nonconvex settings to highlight the interplay between client heterogeneity and communication efficiency.

Tackling the Objective Inconsistency Problem in Heterogeneous Federated Optimization

This paper provides the first principled understanding of the solution bias and the convergence slowdown due to objective inconsistency and proposes FedNova, a normalized averaging method that eliminates objective inconsistency while preserving fast error convergence.

Convergence and Accuracy Trade-Offs in Federated Learning and Meta-Learning

It is proved that for quadratic models, local update methods are equivalent to first-order optimization on a surrogate loss the authors exactly characterize, which sheds new light on a broad range of phenomena, including the efficacy of server momentum in federated learning and the impact of proximal client updates.

Measuring the Effects of Non-Identical Data Distribution for Federated Visual Classification

This work proposes a way to synthesize datasets with a continuous range of identicalness and provide performance measures for the Federated Averaging algorithm, and shows that performance degrades as distributions differ more, and proposes a mitigation strategy via server momentum.

On Large-Cohort Training for Federated Learning

This work explores how the number of clients sampled at each round (the cohort size) impacts the quality of the learned model and the training dynamics of federated learning algorithms.

A Field Guide to Federated Optimization

This paper provides recommendations and guidelines on formulating, designing, evaluating and analyzing federated optimization algorithms through concrete examples and practical implementation, with a focus on conducting effective simulations to infer real-world performance.