HessianFR: An Efficient Hessian-based Follow-the-Ridge Algorithm for Minimax Optimization

  title={HessianFR: An Efficient Hessian-based Follow-the-Ridge Algorithm for Minimax Optimization},
  author={Yihang Gao and Huafeng Liu and Michael K. Ng and Mingjie Zhou},
. Wide applications of differentiable two-player sequential games (e.g., image generation by GANs) have raised much interest and attention of researchers to study efficient and fast algorithms. Most of existing algorithms are developed based on nice properties of simultaneous games, i.e., convex-concave payoff functions, but are not applicable in solving sequential games with different settings. Some conventional gradient descent ascent algorithms theoretically and numerically fail to find the local… 

Figures and Tables from this paper


On Solving Minimax Optimization Locally: A Follow-the-Ridge Approach
Theoretically, the proposed Follow-the-Ridge (FR) algorithm addresses the notorious rotational behaviour of gradient dynamics, and is compatible with preconditioning and positive momentum, and improves the convergence of GAN training compared to the recent minimax optimization algorithms.
The Mechanics of n-Player Differentiable Games
The key result is to decompose the second-order dynamics into two components, related to potential games, which reduce to gradient descent on an implicit function; the second relates to Hamiltonian games, a new class of games that obey a conservation law, akin to conservation laws in classical mechanical systems.
GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium
This work proposes a two time-scale update rule (TTUR) for training GANs with stochastic gradient descent on arbitrary GAN loss functions and introduces the "Frechet Inception Distance" (FID) which captures the similarity of generated images to real ones better than the Inception Score.
The Limit Points of (Optimistic) Gradient Descent in Min-Max Optimization
This work characterize the limit points of two basic first order methods, namely Gradient Descent/Ascent (GDA) and Optimistic Gradients Descent Ascent (OGDA), and shows that both dynamics avoid unstable critical points for almost all initializations.
What is Local Optimality in Nonconvex-Nonconcave Minimax Optimization?
A proper mathematical definition of local optimality for this sequential setting---local minimax is proposed, as well as its properties and existence results are presented.
Second-Order Optimization for Non-Convex Machine Learning: An Empirical Study
Detailed empirical evaluations of a class of Newton-type methods, namely sub-sampled variants of trust region (TR) and adaptive regularization with cubics (ARC) algorithms, for non-convex ML problems demonstrate that these methods not only can be computationally competitive with hand-tuned SGD with momentum, obtaining comparable or better generalization performance, but also they are highly robust to hyper-parameter settings.
The Numerics of GANs
This paper analyzes the numerics of common algorithms for training Generative Adversarial Networks (GANs) and designs a new algorithm that overcomes some of these limitations and has better convergence properties.
Newton-type Methods for Minimax Optimization
This work provides a detailed analysis of existing algorithms and relates them to two novel Newton-type algorithms that converge faster to (strict) local minimax points and are much more effective when the problem is ill-conditioned.
Improving the Improved Training of Wasserstein GANs: A Consistency Term and Its Dual Effect
This paper proposes a novel approach to enforcing the Lipschitz continuity in the training procedure of WGANs, which gives rise to not only better photo-realistic samples than the previous methods but also state-of-the-art semi-supervised learning results.
Adam: A Method for Stochastic Optimization
This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.