A Continuized View on Nesterov Acceleration
@article{Berthier2021ACV, title={A Continuized View on Nesterov Acceleration}, author={Raphael Berthier and Francis R. Bach and Nicolas Flammarion and Pierre Gaillard and Adrien B. Taylor}, journal={ArXiv}, year={2021}, volume={abs/2102.06035} }
We introduce the “continuized” Nesterov acceleration, a close variant of Nesterov acceleration whose variables are indexed by a continuous time parameter. The two variables continuously mix following a linear ordinary differential equation and take gradient steps at random times. This continuized variant benefits from the best of the continuous and the discrete frameworks: as a continuous process, one can use differential calculus to analyze convergence and obtain analytical expressions for the…
One Citation
Resonance in Weight Space: Covariate Shift Can Drive Divergence of SGD with Momentum
- Computer ScienceArXiv
- 2022
It is shown that SGDm under covariate shift with a fixed step-size can be unstable and diverge, and so can suffer from a phenomenon known as resonance, and approximate the learning system as a time varying system of ordinary differential equations, and characterize the system’s divergence/convergence as resonant/nonresonant modes.
References
SHOWING 1-10 OF 40 REFERENCES
A Dynamical Systems Perspective on Nesterov Acceleration
- Computer ScienceICML
- 2019
It is shown that Nesterov acceleration arises from discretizing an ordinary differential equation with a semi-implicit Euler integration scheme, and it is suggested that a curvature-dependent damping term lies at the heart of the phenomenon.
A Differential Equation for Modeling Nesterov's Accelerated Gradient Method: Theory and Insights
- Computer ScienceJ. Mach. Learn. Res.
- 2016
A second-order ordinary differential equation is derived, which is the limit of Nesterov's accelerated gradient method, and it is shown that the continuous time ODE allows for a better understanding of Nestersov's scheme.
Accelerated Mirror Descent in Continuous and Discrete Time
- Computer ScienceNIPS
- 2015
It is shown that a large family of first-order accelerated methods can be obtained as a discretization of the ODE, and these methods converge at a O(1/k2) rate.
Rate of convergence of the Nesterov accelerated gradient method in the subcritical case α ≤ 3
- MathematicsESAIM: Control, Optimisation and Calculus of Variations
- 2019
In a Hilbert space setting ℋ, given Φ : ℋ → ℝ a convex continuously differentiable function, and α a positive parameter, we consider the inertial dynamic system with Asymptotic Vanishing Damping …
Acceleration via Symplectic Discretization of High-Resolution Differential Equations
- Computer ScienceNeurIPS
- 2019
It is shown that the optimization algorithm generated by applying the symplectic scheme to a high-resolution ODE proposed by Shi et al.
A Lyapunov Analysis of Momentum Methods in Optimization
- Computer ScienceArXiv
- 2016
There is an equivalence between the technique of estimate sequences and a family of Lyapunov functions in both continuous and discrete time, which allows for a simple and unified analysis of many existing momentum algorithms.
From Averaging to Acceleration, There is Only a Step-size
- Computer ScienceCOLT
- 2015
We show that accelerated gradient descent, averaged gradient descent and the heavy-ball method for non-strongly-convex problems may be reformulated as constant parameter second-order difference…
Direct Runge-Kutta Discretization Achieves Acceleration
- Computer ScienceNeurIPS
- 2018
It is proved that under Lipschitz-gradient, convexity and order-$(s+2)$ differentiability assumptions, the sequence of iterates generated by discretizing the proposed second-order ODE converges to the optimal solution at a rate of $\mathcal{O}({N^{-2\frac{s}{s+1}}})$, where $s$ is the order of the Runge-Kutta numerical integrator.
Understanding the Acceleration Phenomenon via High-Resolution Differential Equations
- Mathematics, Computer ScienceMathematical Programming
- 2021
An alternative limiting process that yields high-resolution ODEs permit a general Lyapunov function framework for the analysis of convergence in both continuous and discrete time and are more accurate surrogates for the underlying algorithms.
A geometric alternative to Nesterov's accelerated gradient descent
- Computer ScienceArXiv
- 2015
We propose a new method for unconstrained optimization of a smooth and strongly convex function, which attains the optimal rate of convergence of Nesterov’s accelerated gradient descent. The new…