• Corpus ID: 218900864

First-order Convergence Theory for Weakly-Convex-Weakly-Concave Min-max Problems

@article{Liu2021FirstorderCT,
  title={First-order Convergence Theory for Weakly-Convex-Weakly-Concave Min-max Problems},
  author={Mingrui Liu and Hassan Rafique and Qihang Lin and Tianbao Yang},
  journal={J. Mach. Learn. Res.},
  year={2021},
  volume={22},
  pages={169:1-169:34}
}
In this paper, we consider first-order convergence theory and algorithms for solving a class of non-convex non-concave min-max saddle-point problems, whose objective function is weakly convex in the variables of minimization and weakly concave in the variables of maximization. It has many important applications in machine learning including training Generative Adversarial Nets (GANs). We propose an algorithmic framework motivated by the inexact proximal point method, where the weakly monotone… 

Figures and Tables from this paper

AdAUC: End-to-end Adversarial AUC Optimization Against Long-tail Problems

This work reformulate the AUC optimization problem as a saddle point problem, where the objective becomes an instance-wise function, and reformulates the algorithm to generate adversarial examples by calculating the gradient of a min-max problem, providing a convergence guarantee of the proposed algorithm.

Semi-Implicit Hybrid Gradient Methods with Application to Adversarial Robustness

This work general-ize the stochastic primal-dual hybrid gradient algorithm to develop semi-implicit hybrid gradient methods (SI-HGs) for stationary points of nonconvex-nonconcave minimax problems and shows that it outperforms other AT algorithms in terms of convergence speed and robustness.

Accelerated Single-Call Methods for Constrained Min-Max Optimization

  • Yang CaiWeiqiang Zheng
  • Computer Science
  • 2022
We study first-order methods for constrained min-max optimization. Existing methods either requires two gradient calls or two projections in each iteration, which may be costly in applications. In

Escaping limit cycles: Global convergence for constrained nonconvex-nonconcave minimax problems

A new extragradient-type algorithm for a class of nonconvex-nonconcave minimax problems, applicable to constrained and regularized problems, and involves an adaptive stepsize allowing for potentially larger stepsizes.

High Probability Generalization Bounds with Fast Rates for Minimax Problems

This paper provides improved generalization analyses and obtain sharper high probability generalization bounds for most existing generalization measures of minimax problems, and uses the improved learning bounds to establish high probabilitygeneralization bounds with fast rates for classical empirical saddle point (ESP) solution and several popular gradient-based optimization algorithms.

Near-Optimal Algorithms for Making the Gradient Small in Stochastic Minimax Optimization

A novel stochastic algorithm called Recursive Anchored IteratioN (RAIN) is designed and it is shown that the RAIN achieves near-optimal stochastics oracle complexity for Stochastic minimax optimization in both convex-concave and strongly-convex-strongly-con cave cases.

Accelerated Algorithms for Monotone Inclusions and Constrained Nonconvex-Nonconcave Min-Max Optimization

It is proved that the Extra Anchored Gradient algorithm, originally proposed by Yoon and Ryu for unconstrained convex-concave min-max optimization, can be applied to solve the more general problem of Lipschitz monotone inclusion.

First-Order Algorithms for Min-Max Optimization in Geodesic Metric Spaces

It is proved that the Riemannian corrected extragradient (RCEG) method achieves last-iterate convergence at a linear rate in the geodesically strongly-convex-concave case, matching the Euclidean result.

Nest Your Adaptive Algorithm for Parameter-Agnostic Nonconvex Minimax Optimization

A Nested Adaptive framework is formally introduced, NeAda, that carries an inner loop for adaptively maximizing the dual variable with controllable stopping criteria and an outer loop for adaptingively minimizing the primal variable and is claimed to be the first algorithm that simultaneously achieves near-optimal convergence rates and parameter-agnostic adaptation in the nonconvex minimax setting.

Stochastic Gradient Methods with Compressed Communication for Decentralized Saddle Point Problems

The proposed algorithms are the first to achieve sub-linear/linear computation and communication complexities using respectively stochastic gradient/stochastic variance reduced gradient oracles with compressed information exchange to solve non-smooth stronglyconvex strongly-concave saddle-point problems in decentralized setting.

References

SHOWING 1-10 OF 81 REFERENCES

Non-Convex Min-Max Optimization: Provable Algorithms and Applications in Machine Learning

This paper proposes a proximally guided stochastic subgradient method and a proxIMally guided Stochastic variance-reduced method for expected and finite-sum saddle-point problems, respectively and establishes the computation complexities of both methods for finding a nearly stationary point of the corresponding minimization problem.

Stochastic Variance Reduction Methods for Saddle-Point Problems

Convex-concave saddle-point problems where the objective functions may be split in many components are considered, and recent stochastic variance reduction methods are extended to provide the first large-scale linearly convergent algorithms.

Stochastic model-based minimization of weakly convex functions

This work shows that under weak-convexity and Lipschitz conditions, the algorithm drives the expected norm of the gradient of the Moreau envelope to zero at the rate of $O(k^{-1/4})$.

Universal Stagewise Learning for Non-Convex Problems with Convergence on Averaged Solutions

A universal stagewise optimization framework for a broad family of weakly convex problems with the following key features is proposed: at each stage any suitable stochastic convex optimization algorithms that return an averaged solution can be employed for minimizing a regularized convex problem.

An efficient primal dual prox method for non-smooth optimization

A primal dual prox method is developed that solves the minimax optimization problem at a rate of O(1/T) assuming that the proximal step can be efficiently solved, significantly faster than a standard subgradient descent method that has an $$O( 1/\sqrt{T})$$O(1 /T) convergence rate.

On the Convergence Rate of Stochastic Mirror Descent for Nonsmooth Nonconvex Optimization

It is proved that SMD, without the use of mini-batch, is guaranteed to converge to a stationary point in a convergence rate of $ O(1/\sqrt{t}) $, which matches with existing results for stochastic subgradient method, but is evaluated under a stronger stationarity measure.

Accelerated gradient methods for nonconvex nonlinear and stochastic programming

The AG method is generalized to solve nonconvex and possibly stochastic optimization problems and it is demonstrated that by properly specifying the stepsize policy, the AG method exhibits the best known rate of convergence for solving general non Convex smooth optimization problems by using first-order information, similarly to the gradient descent method.

Proximally Guided Stochastic Subgradient Method for Nonsmooth, Nonconvex Problems

A simple proof that the proposed algorithm converges at the same rate as the stochastic gradient method for smooth nonconvex problems is presented, which appears to be the first convergence rate analysis of a Stochastic subgradient method for the class of weakly convex functions.

Unified Convergence Analysis of Stochastic Momentum Methods for Convex and Non-convex Optimization

The empirical results for optimizing deep neural networks demonstrate that the stochastic variant of Nesterov's accelerated gradient method achieves a good tradeoff (between speed of convergence in training error and robustness of converge in testing error) among the three Stochastic methods.

The Limit Points of (Optimistic) Gradient Descent in Min-Max Optimization

This work characterize the limit points of two basic first order methods, namely Gradient Descent/Ascent (GDA) and Optimistic Gradients Descent Ascent (OGDA), and shows that both dynamics avoid unstable critical points for almost all initializations.
...