• Corpus ID: 246652213

Nesterov Accelerated Shuffling Gradient Method for Convex Optimization

@inproceedings{Tran2022NesterovAS,
  title={Nesterov Accelerated Shuffling Gradient Method for Convex Optimization},
  author={Trang H. Tran and Lam M. Nguyen and Katya Scheinberg},
  booktitle={International Conference on Machine Learning},
  year={2022}
}
In this paper, we propose Nesterov Accelerated Shuffling Gradient (NASG), a new algorithm for the convex finite-sum minimization problems. Our method integrates the traditional Nesterov’s acceleration momentum with different shuffling sampling schemes. We show that our algorithm has an improved rate of O (1 /T ) using unified shuffling schemes, where T is the number of epochs. This rate is better than that of any other shuffling gradient methods in convex regime. Our convergence analysis does… 

Figures and Tables from this paper

On the Convergence to a Global Solution of Shuffling-Type Gradient Algorithms

The shuffling version of SGD which matches the mainstream practical heuristics is focused on and the convergence to a global solution of shuffling SGD for a class of non-convex functions under overparameterized settings is shown.

SHUFFLING-TYPE GRADIENT ALGORITHMS

The shuffling version of SGD which matches the mainstream practical heuristics is focused on and the convergence to a global solution of shuffling SGD for a class of non-convex functions under overparameterized settings is shown.

References

SHOWING 1-10 OF 57 REFERENCES

SMG: A Shuffling Gradient-Based Method with Momentum

A novel shuffling gradient-based method with momentum, coined Shuffling Momentum Gradient (SMG), for non-convex finite-sum optimization problems, while its update is fundamentally different from existing momentum-based methods.

A Unified Convergence Analysis for Shuffling-Type Gradient Methods

This paper provides a unified convergence analysis for a class of shuffling-type gradient methods for solving a well-known finite-sum minimization problem commonly used in machine learning and introduces new non-asymptotic and asymptotic convergence rates.

SGD with shuffling: optimal rates without component convexity and large epoch requirements

This work establishes minimax optimal convergence rates of these algorithms up to poly-log factors and further sharpen the tight convergence results for RandomShuffle by removing the drawbacks common to all prior arts.

Random Shuffling Beats SGD after Finite Epochs

It is proved that under strong convexity and second-order smoothness, the sequence generated by RandomShuffle converges to the optimal solution at the rate O(1/T^2 + n^3/ T^3), where n is the number of components in the objective, and T is the total number of iterations.

Accelerated Gradient Methods for Stochastic Optimization and Online Learning

The proposed algorithm, SAGE (Stochastic Accelerated GradiEnt), exhibits fast convergence rates on stochastic composite optimization with convex or strongly convex objectives and can be extended for online learning, resulting in a simple algorithm but with the best regret bounds currently known for these problems.

How Good is SGD with Random Shuffling?

This paper proves that after $k$ passes over individual functions, if the functions are re-shuffled after every pass, the best possible optimization error for SGD is at least $\Omega(1/(nk)^2+1/nk^3\right)$, which partially corresponds to recently derived upper bounds.

Adaptive Subgradient Methods for Online Learning and Stochastic Optimization

This work describes and analyze an apparatus for adaptively modifying the proximal function, which significantly simplifies setting a learning rate and results in regret guarantees that are provably as good as the best proximal functions that can be chosen in hindsight.

Why random reshuffling beats stochastic gradient descent

This paper provides various convergence rate results for RR and variants when the sum function is strongly convex, and shows that when the component functions are quadratics or smooth (with a Lipschitz assumption on the Hessian matrices), RR with iterate averaging and a diminishing stepsize αk=Θ(1/ks) converges to zero.

Random Reshuffling: Simple Analysis with Vast Improvements

The theory for strongly-convex objectives tightly matches the known lower bounds for both RR and SO and substantiates the common practical heuristic of shuffling once or only a few times and proves fast convergence of the Shuffle-Once algorithm, which shuffles the data only once.

New Convergence Aspects of Stochastic Gradient Algorithms

It is shown that for stochastic problems arising in machine learning such bound always holds; and an alternative convergence analysis of SGD with diminishing learning rate regime is proposed.
...