• Corpus ID: 231879963

A Continuized View on Nesterov Acceleration

  title={A Continuized View on Nesterov Acceleration},
  author={Raphael Berthier and Francis R. Bach and Nicolas Flammarion and Pierre Gaillard and Adrien B. Taylor},
We introduce the “continuized” Nesterov acceleration, a close variant of Nesterov acceleration whose variables are indexed by a continuous time parameter. The two variables continuously mix following a linear ordinary differential equation and take gradient steps at random times. This continuized variant benefits from the best of the continuous and the discrete frameworks: as a continuous process, one can use differential calculus to analyze convergence and obtain analytical expressions for the… 

Figures from this paper

Resonance in Weight Space: Covariate Shift Can Drive Divergence of SGD with Momentum
It is shown that SGDm under covariate shift with a fixed step-size can be unstable and diverge, and so can suffer from a phenomenon known as resonance, and approximate the learning system as a time varying system of ordinary differential equations, and characterize the system’s divergence/convergence as resonant/nonresonant modes.


A Dynamical Systems Perspective on Nesterov Acceleration
It is shown that Nesterov acceleration arises from discretizing an ordinary differential equation with a semi-implicit Euler integration scheme, and it is suggested that a curvature-dependent damping term lies at the heart of the phenomenon.
A Differential Equation for Modeling Nesterov's Accelerated Gradient Method: Theory and Insights
A second-order ordinary differential equation is derived, which is the limit of Nesterov's accelerated gradient method, and it is shown that the continuous time ODE allows for a better understanding of Nestersov's scheme.
Accelerated Mirror Descent in Continuous and Discrete Time
It is shown that a large family of first-order accelerated methods can be obtained as a discretization of the ODE, and these methods converge at a O(1/k2) rate.
Rate of convergence of the Nesterov accelerated gradient method in the subcritical case α ≤ 3
In a Hilbert space setting ℋ, given Φ : ℋ → ℝ a convex continuously differentiable function, and α a positive parameter, we consider the inertial dynamic system with Asymptotic Vanishing Damping
Acceleration via Symplectic Discretization of High-Resolution Differential Equations
It is shown that the optimization algorithm generated by applying the symplectic scheme to a high-resolution ODE proposed by Shi et al.
A Lyapunov Analysis of Momentum Methods in Optimization
There is an equivalence between the technique of estimate sequences and a family of Lyapunov functions in both continuous and discrete time, which allows for a simple and unified analysis of many existing momentum algorithms.
From Averaging to Acceleration, There is Only a Step-size
We show that accelerated gradient descent, averaged gradient descent and the heavy-ball method for non-strongly-convex problems may be reformulated as constant parameter second-order difference
Direct Runge-Kutta Discretization Achieves Acceleration
It is proved that under Lipschitz-gradient, convexity and order-$(s+2)$ differentiability assumptions, the sequence of iterates generated by discretizing the proposed second-order ODE converges to the optimal solution at a rate of $\mathcal{O}({N^{-2\frac{s}{s+1}}})$, where $s$ is the order of the Runge-Kutta numerical integrator.
Understanding the Acceleration Phenomenon via High-Resolution Differential Equations
An alternative limiting process that yields high-resolution ODEs permit a general Lyapunov function framework for the analysis of convergence in both continuous and discrete time and are more accurate surrogates for the underlying algorithms.
A geometric alternative to Nesterov's accelerated gradient descent
We propose a new method for unconstrained optimization of a smooth and strongly convex function, which attains the optimal rate of convergence of Nesterov’s accelerated gradient descent. The new