• Corpus ID: 239998407

# Improved Analysis and Rates for Variance Reduction under Without-replacement Sampling Orders

@inproceedings{Huang2021ImprovedAA,
title={Improved Analysis and Rates for Variance Reduction under Without-replacement Sampling Orders},
author={Xinmeng Huang and K. Yuan and Xianghui Mao and Wotao Yin},
year={2021}
}
• Xinmeng Huang, +1 author W. Yin
• Published 25 April 2021
• Computer Science, Mathematics
When applying a stochastic algorithm, one must choose an order to draw samples. The practical choices are without-replacement sampling orders, which are empirically faster and more cache-friendly than uniform-iid-sampling but often have inferior theoretical guarantees. Without-replacement sampling is well understood only for SGD without variance reduction. In this paper, we will improve the convergence analysis and rates of variance reduction under without-replacement sampling orders for…
1 Citations

## Figures and Tables from this paper

Convergence of Random Reshuffling Under The Kurdyka-Łojasiewicz Inequality
• Computer Science, Mathematics
ArXiv
• 2021
Under the well-known Kurdyka-Łojasiewicz (KL) inequality, strong limit-point convergence results for RR with appropriate diminishing step sizes are established, namely, the whole sequence of iterates generated by RR is convergent and converges to a single stationary point in an almost sure sense.

## References

SHOWING 1-10 OF 43 REFERENCES
Variance-Reduced Stochastic Learning Under Random Reshuffling
• Computer Science, Mathematics
IEEE Transactions on Signal Processing
• 2020
A theoretical guarantee of linear convergence under random reshuffling for SAGA in the mean-square sense is provided and a new amortized variance-reduced gradient (AVRG) algorithm with constant storage requirements and balanced gradient computations compared to SVRG is proposed.
How Good is SGD with Random Shuffling?
• Computer Science, Mathematics
COLT 2019
• 2019
This paper proves that after $k$ passes over individual functions, if the functions are re-shuffled after every pass, the best possible optimization error for SGD is at least $\Omega(1/(nk)^2+1/nk^3\right)$, which partially corresponds to recently derived upper bounds.
Random Shuffling Beats SGD after Finite Epochs
• Mathematics, Computer Science
ICML
• 2019
It is proved that under strong convexity and second-order smoothness, the sequence generated by RandomShuffle converges to the optimal solution at the rate O(1/T^2 + n^3/ T^3), where n is the number of components in the objective, and T is the total number of iterations.
Stochastic Learning Under Random Reshuffling With Constant Step-Sizes
• Computer Science, Mathematics
IEEE Transactions on Signal Processing
• 2019
The analysis establishes analytically that random reshuffling outperforms uniform sampling and derives an analytical expression for the steady-state mean-square-error performance of the algorithm, which helps clarify in greater detail, the differences between sampling with and without replacement.
Stochastic gradient descent with finite samples sizes
• Mathematics, Computer Science
2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP)
• 2016
This work draws from recent results in the field of online adaptation to derive new tight performance expressions for empirical implementations of stochastic gradient descent, mini-batchgradient descent, and importance sampling, and proposes an optimal importance sampling algorithm to optimize performance.
Why random reshuffling beats stochastic gradient descent
• Computer Science, Mathematics
Math. Program.
• 2021
The convergence rate of the random reshuffling method is analyzed and it is shown that when the component functions are quadratics or smooth and the sum function is strongly convex, RR with iterate averaging and a diminishing stepsize converges at rate $\Theta(1/k^{2s})$ with probability one in the suboptimality of the objective value, thus improving upon the $\Omega( 1/k)$ rate of SGD.
MISO is Making a Comeback With Better Proofs and Rates
• Mathematics
• 2019
MISO, also known as Finito, was one of the first stochastic variance reduced methods discovered, yet its popularity is fairly low. Its initial analysis was significantly limited by the so-called Big
Random Reshuffling with Variance Reduction: New Analysis and Better Rates
• Computer Science, Mathematics
ArXiv
• 2021
This work provides the first analysis of SVRG under Random Reshuffling (RR-SVRG) for general finite-sum problems and obtains the first sublinear rate for general convex problems.
Stochastic Optimization with Importance Sampling for Regularized Loss Minimization
• Mathematics, Computer Science
ICML
• 2015
Stochastic optimization, including prox-SMD and prox-SDCA, is studied with importance sampling, which improves the convergence rate by reducing the stochastic variance, and theoretically analyze and empirically validate their effectiveness.
SGD without Replacement: Sharper Rates for General Smooth Convex Functions
• Mathematics, Computer Science
ICML
• 2019
The first non-asymptotic results for stochastic gradient descent when applied to general smooth, strongly-convex functions are provided, which show that sgdwor converges at a rate of O(1/K^2) while sgd is known to converge at \$O( 1/K) rate.