# A Continuized View on Nesterov Acceleration

@article{Berthier2021ACV, title={A Continuized View on Nesterov Acceleration}, author={Raphael Berthier and Francis R. Bach and Nicolas Flammarion and Pierre Gaillard and Adrien B. Taylor}, journal={ArXiv}, year={2021}, volume={abs/2102.06035} }

We introduce the “continuized” Nesterov acceleration, a close variant of Nesterov acceleration whose variables are indexed by a continuous time parameter. The two variables continuously mix following a linear ordinary differential equation and take gradient steps at random times. This continuized variant benefits from the best of the continuous and the discrete frameworks: as a continuous process, one can use differential calculus to analyze convergence and obtain analytical expressions for the…

## One Citation

Resonance in Weight Space: Covariate Shift Can Drive Divergence of SGD with Momentum

- Computer ScienceArXiv
- 2022

It is shown that SGDm under covariate shift with a fixed step-size can be unstable and diverge, and so can suffer from a phenomenon known as resonance, and approximate the learning system as a time varying system of ordinary differential equations, and characterize the system’s divergence/convergence as resonant/nonresonant modes.

## References

SHOWING 1-10 OF 40 REFERENCES

A Dynamical Systems Perspective on Nesterov Acceleration

- Computer ScienceICML
- 2019

It is shown that Nesterov acceleration arises from discretizing an ordinary differential equation with a semi-implicit Euler integration scheme, and it is suggested that a curvature-dependent damping term lies at the heart of the phenomenon.

A Differential Equation for Modeling Nesterov's Accelerated Gradient Method: Theory and Insights

- Computer ScienceJ. Mach. Learn. Res.
- 2016

A second-order ordinary differential equation is derived, which is the limit of Nesterov's accelerated gradient method, and it is shown that the continuous time ODE allows for a better understanding of Nestersov's scheme.

Accelerated Mirror Descent in Continuous and Discrete Time

- Computer ScienceNIPS
- 2015

It is shown that a large family of first-order accelerated methods can be obtained as a discretization of the ODE, and these methods converge at a O(1/k2) rate.

Rate of convergence of the Nesterov accelerated gradient method in the subcritical case α ≤ 3

- MathematicsESAIM: Control, Optimisation and Calculus of Variations
- 2019

In a Hilbert space setting ℋ, given Φ : ℋ → ℝ a convex continuously differentiable function, and α a positive parameter, we consider the inertial dynamic system with Asymptotic Vanishing Damping …

Acceleration via Symplectic Discretization of High-Resolution Differential Equations

- Computer ScienceNeurIPS
- 2019

It is shown that the optimization algorithm generated by applying the symplectic scheme to a high-resolution ODE proposed by Shi et al.

A Lyapunov Analysis of Momentum Methods in Optimization

- Computer ScienceArXiv
- 2016

There is an equivalence between the technique of estimate sequences and a family of Lyapunov functions in both continuous and discrete time, which allows for a simple and unified analysis of many existing momentum algorithms.

From Averaging to Acceleration, There is Only a Step-size

- Computer ScienceCOLT
- 2015

We show that accelerated gradient descent, averaged gradient descent and the heavy-ball method for non-strongly-convex problems may be reformulated as constant parameter second-order difference…

Direct Runge-Kutta Discretization Achieves Acceleration

- Computer ScienceNeurIPS
- 2018

It is proved that under Lipschitz-gradient, convexity and order-$(s+2)$ differentiability assumptions, the sequence of iterates generated by discretizing the proposed second-order ODE converges to the optimal solution at a rate of $\mathcal{O}({N^{-2\frac{s}{s+1}}})$, where $s$ is the order of the Runge-Kutta numerical integrator.

Understanding the Acceleration Phenomenon via High-Resolution Differential Equations

- Mathematics, Computer ScienceMathematical Programming
- 2021

An alternative limiting process that yields high-resolution ODEs permit a general Lyapunov function framework for the analysis of convergence in both continuous and discrete time and are more accurate surrogates for the underlying algorithms.

A geometric alternative to Nesterov's accelerated gradient descent

- Computer ScienceArXiv
- 2015

We propose a new method for unconstrained optimization of a smooth and strongly convex function, which attains the optimal rate of convergence of Nesterov’s accelerated gradient descent. The new…