• Corpus ID: 246652213

# Nesterov Accelerated Shuffling Gradient Method for Convex Optimization

@inproceedings{Tran2022NesterovAS,
title={Nesterov Accelerated Shuffling Gradient Method for Convex Optimization},
author={Trang H. Tran and Lam M. Nguyen and Katya Scheinberg},
booktitle={International Conference on Machine Learning},
year={2022}
}
• Published in
International Conference on…
7 February 2022
• Computer Science
In this paper, we propose Nesterov Accelerated Shuffling Gradient (NASG), a new algorithm for the convex finite-sum minimization problems. Our method integrates the traditional Nesterov’s acceleration momentum with different shuffling sampling schemes. We show that our algorithm has an improved rate of O (1 /T ) using unified shuffling schemes, where T is the number of epochs. This rate is better than that of any other shuffling gradient methods in convex regime. Our convergence analysis does…
2 Citations

## Figures and Tables from this paper

• Computer Science
ArXiv
• 2022
The shuffling version of SGD which matches the mainstream practical heuristics is focused on and the convergence to a global solution of shuffling SGD for a class of non-convex functions under overparameterized settings is shown.
• Computer Science
• 2022
The shuffling version of SGD which matches the mainstream practical heuristics is focused on and the convergence to a global solution of shuffling SGD for a class of non-convex functions under overparameterized settings is shown.

## References

SHOWING 1-10 OF 57 REFERENCES

• Computer Science
ICML
• 2021
A novel shuffling gradient-based method with momentum, coined Shuffling Momentum Gradient (SMG), for non-convex finite-sum optimization problems, while its update is fundamentally different from existing momentum-based methods.
• Computer Science, Mathematics
J. Mach. Learn. Res.
• 2021
This paper provides a unified convergence analysis for a class of shuffling-type gradient methods for solving a well-known finite-sum minimization problem commonly used in machine learning and introduces new non-asymptotic and asymptotic convergence rates.
• Computer Science
NeurIPS
• 2020
This work establishes minimax optimal convergence rates of these algorithms up to poly-log factors and further sharpen the tight convergence results for RandomShuffle by removing the drawbacks common to all prior arts.
• Computer Science
ICML
• 2019
It is proved that under strong convexity and second-order smoothness, the sequence generated by RandomShuffle converges to the optimal solution at the rate O(1/T^2 + n^3/ T^3), where n is the number of components in the objective, and T is the total number of iterations.
• Computer Science
NIPS
• 2009
The proposed algorithm, SAGE (Stochastic Accelerated GradiEnt), exhibits fast convergence rates on stochastic composite optimization with convex or strongly convex objectives and can be extended for online learning, resulting in a simple algorithm but with the best regret bounds currently known for these problems.
• Computer Science
COLT 2019
• 2019
This paper proves that after $k$ passes over individual functions, if the functions are re-shuffled after every pass, the best possible optimization error for SGD is at least $\Omega(1/(nk)^2+1/nk^3\right)$, which partially corresponds to recently derived upper bounds.
• Computer Science
J. Mach. Learn. Res.
• 2011
This work describes and analyze an apparatus for adaptively modifying the proximal function, which significantly simplifies setting a learning rate and results in regret guarantees that are provably as good as the best proximal functions that can be chosen in hindsight.
• Computer Science, Mathematics
Math. Program.
• 2021
This paper provides various convergence rate results for RR and variants when the sum function is strongly convex, and shows that when the component functions are quadratics or smooth (with a Lipschitz assumption on the Hessian matrices), RR with iterate averaging and a diminishing stepsize αk=Θ(1/ks) converges to zero.
• Computer Science
NeurIPS
• 2020
The theory for strongly-convex objectives tightly matches the known lower bounds for both RR and SO and substantiates the common practical heuristic of shuffling once or only a few times and proves fast convergence of the Shuffle-Once algorithm, which shuffles the data only once.
• Computer Science
J. Mach. Learn. Res.
• 2019
It is shown that for stochastic problems arising in machine learning such bound always holds; and an alternative convergence analysis of SGD with diminishing learning rate regime is proposed.