# On Tight Convergence Rates of Without-replacement SGD

@article{Ahn2020OnTC, title={On Tight Convergence Rates of Without-replacement SGD}, author={Kwangjun Ahn and Suvrit Sra}, journal={ArXiv}, year={2020}, volume={abs/2004.08657} }

For solving finite-sum optimization problems, SGD without replacement sampling is empirically shown to outperform SGD. Denoting by $n$ the number of components in the cost and $K$ the number of epochs of the algorithm , several recent works have shown convergence rates of without-replacement SGD that have better dependency on $n$ and $K$ than the baseline rate of $O(1/(nK))$ for SGD. However, there are two main limitations shared among those works: the rates have extra poly-logarithmic factors…

## 4 Citations

### Stochastic Variance Reduction via Accelerated Dual Averaging for Finite-Sum Optimization

- Computer ScienceNeurIPS
- 2020

Stochastic Variance Reduction via Accelerated Dual Averaging improves complexity of the best known methods without use of any additional strategy such as optimal black-box reduction, and it leads to a unified convergence analysis and simplified algorithm for both the nonstrongly convex and strongly convex settings.

### Random Reshuffling: Simple Analysis with Vast Improvements

- Computer ScienceNeurIPS
- 2020

The theory for strongly-convex objectives tightly matches the known lower bounds for both RR and SO and substantiates the common practical heuristic of shuffling once or only a few times and proves fast convergence of the Shuffle-Once algorithm, which shuffles the data only once.

### FedShuffle: Recipes for Better Use of Local Work in Federated Learning

- Computer ScienceArXiv
- 2022

This work presents a comprehensive theoretical analysis of FedShuffle and shows that it does not suffer from the objective function mismatch that is present in FL methods that assume homogeneous updates in heterogeneous FL setups, such as FedAvg (McMahan et al., 2017).

### Federated Learning with Regularized Client Participation

- Computer ScienceArXiv
- 2023

This research proposes a new technique and design a novel regularized client participation scheme that leads to a reduction in the variance caused by client sampling and combined with the popular FedAvg algorithm results in superior rates under standard assumptions.

## References

SHOWING 1-10 OF 14 REFERENCES

### Closing the convergence gap of SGD without replacement

- Computer Science, MathematicsICML
- 2020

It is shown that SGD without replacement achieves a rate of $\mathcal{O}\left(\frac{1}{T^2}+\frac{n^2}{ T^3}\right)$ when the sum of the functions is a quadratic, and a new lower bound is offered of $\Omega\left(frac{ n}{T ^2}\ right)$ for strongly convex functions that are sums of smooth functions.

### How Good is SGD with Random Shuffling?

- Computer ScienceCOLT 2019
- 2019

This paper proves that after $k$ passes over individual functions, if the functions are re-shuffled after every pass, the best possible optimization error for SGD is at least $\Omega(1/(nk)^2+1/nk^3\right)$, which partially corresponds to recently derived upper bounds.

### SGD without Replacement: Sharper Rates for General Smooth Convex Functions

- Computer ScienceICML
- 2019

The first non-asymptotic results for stochastic gradient descent when applied to general smooth, strongly-convex functions are provided, which show that sgdwor converges at a rate of O(1/K^2) while sgd is known to converge at $O( 1/K) rate.

### Random Shuffling Beats SGD after Finite Epochs

- Computer ScienceICML
- 2019

It is proved that under strong convexity and second-order smoothness, the sequence generated by RandomShuffle converges to the optimal solution at the rate O(1/T^2 + n^3/ T^3), where n is the number of components in the objective, and T is the total number of iterations.

### Without-Replacement Sampling for Stochastic Gradient Methods

- Computer ScienceNIPS
- 2016

This paper provides competitive convergence guarantees for without-replacement sampling under several scenarios, focusing on the natural regime of few passes over the data, yielding a nearly-optimal algorithm for regularized least squares under broad parameter regimes.

### A Unified Convergence Analysis for Shuffling-Type Gradient Methods

- Computer Science, MathematicsJ. Mach. Learn. Res.
- 2021

This paper provides a unified convergence analysis for a class of shuffling-type gradient methods for solving a well-known finite-sum minimization problem commonly used in machine learning and introduces new non-asymptotic and asymptotic convergence rates.

### Convergence Rate of Incremental Subgradient Algorithms

- Mathematics, Computer Science
- 2001

An incremental approach to minimizing a convex function that consists of the sum of a large number of component functions is considered, which has been very successful in solving large differentiable least squares problems, such as those arising in the training of neural networks.

### Why random reshuffling beats stochastic gradient descent

- Computer Science, MathematicsMath. Program.
- 2021

This paper provides various convergence rate results for RR and variants when the sum function is strongly convex, and shows that when the component functions are quadratics or smooth (with a Lipschitz assumption on the Hessian matrices), RR with iterate averaging and a diminishing stepsize αk=Θ(1/ks) converges to zero.

### Curiously Fast Convergence of some Stochastic Gradient Descent Algorithms

- Mathematics, Computer Science
- 2009

This work considers three ways to pick the example z[t] at each iteration of a stochastic gradient algorithm, which is interested in minimizing the cost function min θ C(θ) = 1 m m i=1 ℓ(z i, θ).

### A Stochastic Approximation Method

- Mathematics
- 2007

Let M(x) denote the expected value at level x of the response to a certain experiment. M(x) is assumed to be a monotone function of x but is unknown tot he experiment, and it is desire to find the…