• Corpus ID: 218900864

First-order Convergence Theory for Weakly-Convex-Weakly-Concave Min-max Problems

@article{Liu2021FirstorderCT,
  title={First-order Convergence Theory for Weakly-Convex-Weakly-Concave Min-max Problems},
  author={Mingrui Liu and Hassan Rafique and Qihang Lin and Tianbao Yang},
  journal={J. Mach. Learn. Res.},
  year={2021},
  volume={22},
  pages={169:1-169:34}
}
In this paper, we consider first-order convergence theory and algorithms for solving a class of non-convex non-concave min-max saddle-point problems, whose objective function is weakly convex in the variables of minimization and weakly concave in the variables of maximization. It has many important applications in machine learning including training Generative Adversarial Nets (GANs). We propose an algorithmic framework motivated by the inexact proximal point method, where the weakly monotone… 

Figures and Tables from this paper

AdAUC: End-to-end Adversarial AUC Optimization Against Long-tail Problems
TLDR
This work reformulate the AUC optimization problem as a saddle point problem, where the objective becomes an instance-wise function, and reformulates the algorithm to generate adversarial examples by calculating the gradient of a min-max problem, providing a convergence guarantee of the proposed algorithm.
Semi-Implicit Hybrid Gradient Methods with Application to Adversarial Robustness
TLDR
This work general-ize the stochastic primal-dual hybrid gradient algorithm to develop semi-implicit hybrid gradient methods (SI-HGs) for stationary points of nonconvex-nonconcave minimax problems and shows that it outperforms other AT algorithms in terms of convergence speed and robustness.
Accelerated Algorithms for Monotone Inclusions and Constrained Nonconvex-Nonconcave Min-Max Optimization
TLDR
It is proved that the Extra Anchored Gradient algorithm, originally proposed by Yoon and Ryu for unconstrained convex-concave min-max optimization, can be applied to solve the more general problem of Lipschitz monotone inclusion.
First-Order Algorithms for Min-Max Optimization in Geodesic Metric Spaces
TLDR
It is proved that the Riemannian corrected extragradient (RCEG) method achieves last-iterate convergence at a linear rate in the geodesically strongly-convex-concave case, matching the Euclidean result.
Nest Your Adaptive Algorithm for Parameter-Agnostic Nonconvex Minimax Optimization
TLDR
A Nested Adaptive framework is formally introduced, NeAda, that carries an inner loop for adaptively maximizing the dual variable with controllable stopping criteria and an outer loop for adaptingively minimizing the primal variable and is claimed to be the first algorithm that simultaneously achieves near-optimal convergence rates and parameter-agnostic adaptation in the nonconvex minimax setting.
Stochastic Gradient Methods with Compressed Communication for Decentralized Saddle Point Problems
TLDR
The proposed algorithms are the first to achieve sub-linear/linear computation and communication complexities using respectively stochastic gradient/stochastic variance reduced gradient oracles with compressed information exchange to solve non-smooth stronglyconvex strongly-concave saddle-point problems in decentralized setting.
Newton and interior-point methods for (constrained) nonconvex-nonconcave minmax optimization with stability guarantees
TLDR
The first minmax result shows that additional conditions are needed to guarantee that every locally asymptotically stable equilibrium point of a Newton-type iteration is a local minmax, and shows that the computation time of the proposed algorithm scales roughly linearly with the number of nonzero elements in the Hessian.
Perseus: A Simple and Optimal High-Order Method for Variational Inequalities
TLDR
A p th -order method is proposed that does not require any binary search procedure and it is proved that it can converge to a weak solution at a global rate of O ( ǫ − 2 / ( p +1) ).
Perseus: A Simple High-Order Regularization Method for Variational Inequalities
TLDR
This paper proposes a p th -order method which does not require any binary search scheme and is guaranteed to converge to a weak solution with a global rate of O ( ǫ − 2 / ( p +1) ).
Stability and Generalization of Differentially Private Minimax Problems
TLDR
This paper focuses on the privacy of the general minimax setting, combining differential privacy together with minimax optimization paradigm, and theoretically analyzes the high probability generalization performance of the differen- tially private minimax algorithm under the strongly-convex-strongly-concave condition.
...
...

References

SHOWING 1-10 OF 81 REFERENCES
Non-Convex Min-Max Optimization: Provable Algorithms and Applications in Machine Learning
TLDR
This paper proposes a proximally guided stochastic subgradient method and a proxIMally guided Stochastic variance-reduced method for expected and finite-sum saddle-point problems, respectively and establishes the computation complexities of both methods for finding a nearly stationary point of the corresponding minimization problem.
Stochastic Variance Reduction Methods for Saddle-Point Problems
TLDR
Convex-concave saddle-point problems where the objective functions may be split in many components are considered, and recent stochastic variance reduction methods are extended to provide the first large-scale linearly convergent algorithms.
Stochastic model-based minimization of weakly convex functions
TLDR
This work shows that under weak-convexity and Lipschitz conditions, the algorithm drives the expected norm of the gradient of the Moreau envelope to zero at the rate of $O(k^{-1/4})$.
Universal Stagewise Learning for Non-Convex Problems with Convergence on Averaged Solutions
TLDR
A universal stagewise optimization framework for a broad family of weakly convex problems with the following key features is proposed: at each stage any suitable stochastic convex optimization algorithms that return an averaged solution can be employed for minimizing a regularized convex problem.
An efficient primal dual prox method for non-smooth optimization
TLDR
A primal dual prox method is developed that solves the minimax optimization problem at a rate of O(1/T) assuming that the proximal step can be efficiently solved, significantly faster than a standard subgradient descent method that has an $$O( 1/\sqrt{T})$$O(1 /T) convergence rate.
On the Convergence Rate of Stochastic Mirror Descent for Nonsmooth Nonconvex Optimization
TLDR
It is proved that SMD, without the use of mini-batch, is guaranteed to converge to a stationary point in a convergence rate of $ O(1/\sqrt{t}) $, which matches with existing results for stochastic subgradient method, but is evaluated under a stronger stationarity measure.
Accelerated gradient methods for nonconvex nonlinear and stochastic programming
TLDR
The AG method is generalized to solve nonconvex and possibly stochastic optimization problems and it is demonstrated that by properly specifying the stepsize policy, the AG method exhibits the best known rate of convergence for solving general non Convex smooth optimization problems by using first-order information, similarly to the gradient descent method.
Proximally Guided Stochastic Subgradient Method for Nonsmooth, Nonconvex Problems
TLDR
A simple proof that the proposed algorithm converges at the same rate as the stochastic gradient method for smooth nonconvex problems is presented, which appears to be the first convergence rate analysis of a Stochastic subgradient method for the class of weakly convex functions.
Unified Convergence Analysis of Stochastic Momentum Methods for Convex and Non-convex Optimization
TLDR
The empirical results for optimizing deep neural networks demonstrate that the stochastic variant of Nesterov's accelerated gradient method achieves a good tradeoff (between speed of convergence in training error and robustness of converge in testing error) among the three Stochastic methods.
The Limit Points of (Optimistic) Gradient Descent in Min-Max Optimization
TLDR
This work characterize the limit points of two basic first order methods, namely Gradient Descent/Ascent (GDA) and Optimistic Gradients Descent Ascent (OGDA), and shows that both dynamics avoid unstable critical points for almost all initializations.
...
...