# A Continuized View on Nesterov Acceleration

@article{Berthier2021ACV, title={A Continuized View on Nesterov Acceleration}, author={Raphael Berthier and Francis R. Bach and Nicolas Flammarion and Pierre Gaillard and Adrien B. Taylor}, journal={ArXiv}, year={2021}, volume={abs/2102.06035} }

We introduce the “continuized” Nesterov acceleration, a close variant of Nesterov acceleration whose variables are indexed by a continuous time parameter. The two variables continuously mix following a linear ordinary differential equation and take gradient steps at random times. This continuized variant benefits from the best of the continuous and the discrete frameworks: as a continuous process, one can use differential calculus to analyze convergence and obtain analytical expressions for the…

## References

SHOWING 1-10 OF 40 REFERENCES

A Dynamical Systems Perspective on Nesterov Acceleration

- Physics, Computer ScienceICML
- 2019

It is shown that Nesterov acceleration arises from discretizing an ordinary differential equation with a semi-implicit Euler integration scheme, and it is suggested that a curvature-dependent damping term lies at the heart of the phenomenon.

A Differential Equation for Modeling Nesterov's Accelerated Gradient Method: Theory and Insights

- Mathematics, Computer ScienceJ. Mach. Learn. Res.
- 2016

A second-order ordinary differential equation is derived, which is the limit of Nesterov's accelerated gradient method, and it is shown that the continuous time ODE allows for a better understanding of Nestersov's scheme.

Accelerated Mirror Descent in Continuous and Discrete Time

- Computer Science, MathematicsNIPS
- 2015

It is shown that a large family of first-order accelerated methods can be obtained as a discretization of the ODE, and these methods converge at a O(1/k2) rate.

Rate of convergence of the Nesterov accelerated gradient method in the subcritical case α ≤ 3

- MathematicsESAIM: Control, Optimisation and Calculus of Variations
- 2019

In a Hilbert space setting ℋ, given Φ : ℋ → ℝ a convex continuously differentiable function, and α a positive parameter, we consider the inertial dynamic system with Asymptotic Vanishing Damping …

Acceleration via Symplectic Discretization of High-Resolution Differential Equations

- Mathematics, Computer ScienceNeurIPS
- 2019

It is shown that the optimization algorithm generated by applying the symplectic scheme to a high-resolution ODE proposed by Shi et al.

A Lyapunov Analysis of Momentum Methods in Optimization

- Mathematics, Computer ScienceArXiv
- 2016

There is an equivalence between the technique of estimate sequences and a family of Lyapunov functions in both continuous and discrete time, which allows for a simple and unified analysis of many existing momentum algorithms.

From Averaging to Acceleration, There is Only a Step-size

- Computer Science, MathematicsCOLT
- 2015

We show that accelerated gradient descent, averaged gradient descent and the heavy-ball method for non-strongly-convex problems may be reformulated as constant parameter second-order difference…

Direct Runge-Kutta Discretization Achieves Acceleration

- Computer Science, MathematicsNeurIPS
- 2018

It is proved that under Lipschitz-gradient, convexity and order-$(s+2)$ differentiability assumptions, the sequence of iterates generated by discretizing the proposed second-order ODE converges to the optimal solution at a rate of $\mathcal{O}({N^{-2\frac{s}{s+1}}})$, where $s$ is the order of the Runge-Kutta numerical integrator.

Understanding the Acceleration Phenomenon via High-Resolution Differential Equations

- Mathematics, Computer ScienceMathematical Programming
- 2021

An alternative limiting process that yields high-resolution ODEs permit a general Lyapunov function framework for the analysis of convergence in both continuous and discrete time and are more accurate surrogates for the underlying algorithms.

A geometric alternative to Nesterov's accelerated gradient descent

- Mathematics, Computer ScienceArXiv
- 2015

We propose a new method for unconstrained optimization of a smooth and strongly convex function, which attains the optimal rate of convergence of Nesterov’s accelerated gradient descent. The new…