• Corpus ID: 239049777

Minibatch vs Local SGD with Shuffling: Tight Convergence Bounds and Beyond

@article{Yun2021MinibatchVL,
  title={Minibatch vs Local SGD with Shuffling: Tight Convergence Bounds and Beyond},
  author={Chulhee Yun and Shashank Rajput and Suvrit Sra},
  journal={ArXiv},
  year={2021},
  volume={abs/2110.10342}
}
In distributed learning, local SGD (also known as federated averaging) and its simple baseline minibatch SGD are widely studied optimization methods. Most existing analyses of these methods assume independent and unbiased gradient estimates obtained via with-replacement sampling. In contrast, we study shuffling-based variants: minibatch and local Random Reshuffling, which draw stochastic gradients without replacement and are thus closer to practice. For smooth functions satisfying the Polyak… 

Server-Side Stepsizes and Sampling Without Replacement Provably Help in Federated Optimization

The results are the first to show that the widely popular heuristic of scaling the client updates with an extra parameter is very useful in the context of Federated Averaging with local passes over the client data and the first time that local steps provably help to overcome the communication bottleneck.

On Server-Side Stepsizes in Federated Optimization: Theory Explaining the Heuristics

The first to show that the widely popular heuristic of scaling the client updates with an extra parameter is extremely useful in the context of Federated Averaging with local passes over the client data, and it is proved that whenever the local stepsizes are small and the update direction is given by FedAvg over all clients, one can take a big leap in the obtained direction.

Federated Unsupervised Clustering with Generative Models

UIFCA is developed using generative models with IFCA framework, that solves for a more general setting where the data in the same client can also come from different clusters, and can correctly re- cover the cluster information of individual datapoints.

A Unified Analysis of Federated Learning with Arbitrary Client Participation

This paper introduces a generalized version of federated averaging (FedAvg) that amplifies parameter updates at an interval of multiple FL rounds and presents a novel analysis that captures the effect of client participation in a single term, obtaining convergence upper bounds for a wide range of participation patterns.

Federated Optimization Algorithms with Random Reshuffling and Gradient Compression

This work develops a distributed variant of random reshuffling with gradient compression (Q-RR), and shows how to reduce the variance coming from gradient quantization through the use of control iterates, and proposes a variant of Q-RR called Q-NASTYA to have a better fit to Federated Learning applications.

FedShuffle: Recipes for Better Use of Local Work in Federated Learning

This work presents a comprehensive theoretical analysis of FedShuffle and shows that it does not suffer from the objective function mismatch that is present in FL methods that assume homogeneous updates in heterogeneous FL setups, such as FedAvg (McMahan et al., 2017).

SGDA with shuffling: faster convergence for nonconvex-P{\L} minimax optimization

This work studies the convergence bounds of SGDA with random reshuffling ( SGDA- RR) for smooth nonconvex-nonconcave objectives with Polyak-Łojasiewicz (PŁ) geometry, and presents a comprehensive lower bound for two-time-scale GDA, which matches the full-batch rate for primal-Pł-P Ł case.

Efficiency Ordering of Stochastic Gradient Descent

It is demonstrated how certain non-Markovian processes, for which typical mixing-time based non-asymptotic bounds are intractable, can outperform their Markovian counterparts in the sense of efficiency ordering for SGD.

Provable Adaptivity in Adam

It is argued that Adam can adapt to the local smoothness condition, justifying the adaptation of Adam and shed light on the benefit of adaptive gradient methods over non-adaptive ones.