• Corpus ID: 239998407

Improved Analysis and Rates for Variance Reduction under Without-replacement Sampling Orders

  title={Improved Analysis and Rates for Variance Reduction under Without-replacement Sampling Orders},
  author={Xinmeng Huang and K. Yuan and Xianghui Mao and Wotao Yin},
  • Xinmeng Huang, K. Yuan, +1 author W. Yin
  • Published 25 April 2021
  • Computer Science, Mathematics
When applying a stochastic algorithm, one must choose an order to draw samples. The practical choices are without-replacement sampling orders, which are empirically faster and more cache-friendly than uniform-iid-sampling but often have inferior theoretical guarantees. Without-replacement sampling is well understood only for SGD without variance reduction. In this paper, we will improve the convergence analysis and rates of variance reduction under without-replacement sampling orders for… 
1 Citations

Figures and Tables from this paper

Convergence of Random Reshuffling Under The Kurdyka-Łojasiewicz Inequality
Under the well-known Kurdyka-Łojasiewicz (KL) inequality, strong limit-point convergence results for RR with appropriate diminishing step sizes are established, namely, the whole sequence of iterates generated by RR is convergent and converges to a single stationary point in an almost sure sense.


Variance-Reduced Stochastic Learning Under Random Reshuffling
A theoretical guarantee of linear convergence under random reshuffling for SAGA in the mean-square sense is provided and a new amortized variance-reduced gradient (AVRG) algorithm with constant storage requirements and balanced gradient computations compared to SVRG is proposed.
How Good is SGD with Random Shuffling?
This paper proves that after $k$ passes over individual functions, if the functions are re-shuffled after every pass, the best possible optimization error for SGD is at least $\Omega(1/(nk)^2+1/nk^3\right)$, which partially corresponds to recently derived upper bounds.
Random Shuffling Beats SGD after Finite Epochs
It is proved that under strong convexity and second-order smoothness, the sequence generated by RandomShuffle converges to the optimal solution at the rate O(1/T^2 + n^3/ T^3), where n is the number of components in the objective, and T is the total number of iterations.
Stochastic Learning Under Random Reshuffling With Constant Step-Sizes
The analysis establishes analytically that random reshuffling outperforms uniform sampling and derives an analytical expression for the steady-state mean-square-error performance of the algorithm, which helps clarify in greater detail, the differences between sampling with and without replacement.
Stochastic gradient descent with finite samples sizes
This work draws from recent results in the field of online adaptation to derive new tight performance expressions for empirical implementations of stochastic gradient descent, mini-batchgradient descent, and importance sampling, and proposes an optimal importance sampling algorithm to optimize performance.
Why random reshuffling beats stochastic gradient descent
The convergence rate of the random reshuffling method is analyzed and it is shown that when the component functions are quadratics or smooth and the sum function is strongly convex, RR with iterate averaging and a diminishing stepsize converges at rate $\Theta(1/k^{2s})$ with probability one in the suboptimality of the objective value, thus improving upon the $\Omega( 1/k)$ rate of SGD.
MISO is Making a Comeback With Better Proofs and Rates
MISO, also known as Finito, was one of the first stochastic variance reduced methods discovered, yet its popularity is fairly low. Its initial analysis was significantly limited by the so-called Big
Random Reshuffling with Variance Reduction: New Analysis and Better Rates
This work provides the first analysis of SVRG under Random Reshuffling (RR-SVRG) for general finite-sum problems and obtains the first sublinear rate for general convex problems.
Stochastic Optimization with Importance Sampling for Regularized Loss Minimization
Stochastic optimization, including prox-SMD and prox-SDCA, is studied with importance sampling, which improves the convergence rate by reducing the stochastic variance, and theoretically analyze and empirically validate their effectiveness.
SGD without Replacement: Sharper Rates for General Smooth Convex Functions
The first non-asymptotic results for stochastic gradient descent when applied to general smooth, strongly-convex functions are provided, which show that sgdwor converges at a rate of O(1/K^2) while sgd is known to converge at $O( 1/K) rate.