• Corpus ID: 56219261

On Symplectic Optimization

  title={On Symplectic Optimization},
  author={Michael Betancourt and Michael I. Jordan and Ashia C. Wilson},
  journal={arXiv: Computation},
Accelerated gradient methods have had significant impact in machine learning -- in particular the theoretical side of machine learning -- due to their ability to achieve oracle lower bounds. But their heuristic construction has hindered their full integration into the practical machine-learning algorithmic toolbox, and has limited their scope. In this paper we build on recent work which casts acceleration as a phenomenon best explained in continuous time, and we augment that picture by… 

Figures from this paper

Optimization algorithms inspired by the geometry of dissipative systems

Dynamical systems defined through a contact geometry are introduced which are not only naturally suited to the optimization goal but also subsume all previous methods based on geometric dynamical systems, which shows that optimization algorithms that achieve oracle lower bounds on convergence rates can be obtained.

On dissipative symplectic integration with applications to gradient-based optimization

A generalization of symplectic integrators to non-conservative and in particular dissipative Hamiltonian systems is able to preserve rates of convergence up to a controlled error, enabling the derivation of ‘rate-matching’ algorithms without the need for a discrete convergence analysis.

Variational Symplectic Accelerated Optimization on Lie Groups

A Lie group variational discretization based on an extended path space formulation of the Bregman Lagrangian on Lie groups is developed, and its computational properties are analyzed with two examples in attitude determination and vision-based localization.

Conformal symplectic and relativistic optimization

This work proposes a new algorithm based on a dissipative relativistic system that normalizes the momentum and may result in more stable/faster optimization, and generalizes both Nesterov and heavy ball.

Continuous Time Analysis of Momentum Methods

This work focuses on understanding the role of momentum in the training of neural networks, concentrating on the common situation in which the momentum contribution is fixed at each step of the algorithm, and proves three continuous time approximations of discrete algorithms of the discrete algorithms.


  • Michael I. Jordan
  • Computer Science
    Proceedings of the International Congress of Mathematicians (ICM 2018)
  • 2019
This work goes beyond classical gradient flow to focus on second-order dynamics, aiming to show the relevance of such dynamics to optimization algorithms that not only converge, but converge quickly.

Optimization on manifolds: A symplectic approach

There has been great interest in using tools from dynamical systems and numerical analysis of differential equations to understand and construct new optimization methods. In particular, recently a

Practical Perspectives on Symplectic Accelerated Optimization

This paper investigates how momentum restarting schemes ameliorate computational efficiency and robustness by reducing the undesirable effect of oscillations, and ease the tuning process by making time-adaptivity superfluous.

Conformal Symplectic and Relativistic Optimization

This work proposes a new algorithm based on a dissipative relativistic system that normalizes the momentum and may result in more stable/faster optimization, and generalizes both Nesterov and heavy ball, and has potential advantages at no additional cost.

The Role of Memory in Stochastic Optimization

This work derives a general continuous-time model that can incorporate arbitrary types of memory, for both deterministic and stochastic settings, and provides convergence guarantees for this SDE for weakly-quasi-convex and quadratically growing functions.



A variational perspective on accelerated methods in optimization

A variational, continuous-time framework for understanding accelerated methods is proposed and a systematic methodology for converting accelerated higher-order methods from continuous time to discrete time is provided, which illuminates a class of dynamics that may be useful for designing better algorithms for optimization.

Accelerated Mirror Descent in Continuous and Discrete Time

It is shown that a large family of first-order accelerated methods can be obtained as a discretization of the ODE, and these methods converge at a O(1/k2) rate.

On the Nonlinear Stability of Symplectic Integrators

The modified Hamiltonian is used to study the nonlinear stability of symplectic integrators, especially for nonlinear oscillators. We give conditions under which an initial condition on a compact

A Differential Equation for Modeling Nesterov's Accelerated Gradient Method: Theory and Insights

A second-order ordinary differential equation is derived, which is the limit of Nesterov's accelerated gradient method, and it is shown that the continuous time ODE allows for a better understanding of Nestersov's scheme.

The Fundamental Incompatibility of Scalable Hamiltonian Monte Carlo and Naive Data Subsampling

It is demonstrated how data subsampling fundamentally compromises the scalability of Hamiltonian Monte Carlo.

Simulating Hamiltonian dynamics

Reading simulating hamiltonian dynamics is a way as one of the collective books that gives many advantages and will greatly develop your experiences about everything.

Introduction to Smooth Manifolds

Preface.- 1 Smooth Manifolds.- 2 Smooth Maps.- 3 Tangent Vectors.- 4 Submersions, Immersions, and Embeddings.- 5 Submanifolds.- 6 Sard's Theorem.- 7 Lie Groups.- 8 Vector Fields.- 9 Integral Curves

Classical Dynamics: A Contemporary Approach

1. Fundamentals of mechanics 2. Lagrangian formulation of mechanics 3. Topics in Lagrangian dynamics 4. Scattering and linear oscillations 5. Hamiltonian formulation of mechanics 6. Topics in

Accelerated Gradient Descent Escapes Saddle Points Faster than Gradient Descent

To the best of the knowledge, this is the first Hessian-free algorithm to find a second-order stationary point faster than GD, and also the first single-loop algorithm with a faster rate than GD even in the setting of finding a first- order stationary point.