• Corpus ID: 238583275

Finding Second-Order Stationary Point for Nonconvex-Strongly-Concave Minimax Problem

@article{Luo2021FindingSS,
  title={Finding Second-Order Stationary Point for Nonconvex-Strongly-Concave Minimax Problem},
  author={Luo Luo and Cheng Chen},
  journal={ArXiv},
  year={2021},
  volume={abs/2110.04814}
}
  • Luo Luo, Cheng Chen
  • Published 10 October 2021
  • Computer Science, Mathematics
  • ArXiv
We study the smooth minimax optimization problem of the form minx maxy f(x,y), where the objective function is strongly-concave in y but possibly nonconvex in x. This problem includes a lot of applications in machine learning such as regularized GAN, reinforcement learning and adversarial training. Most of existing theory related to gradient descent accent focus on establishing the convergence result for achieving the first-order stationary point of f(x,y) or primal function P (x) , maxy f(x,y… 
Faster Single-loop Algorithms for Minimax Optimization without Strong Concavity
TLDR
New convergence results for two alternative single-loop algorithms – alternating GDA and smoothed GDA – under the mild assumption that the objective satisfies the PolyakLojasiewicz (PL) condition about one variable are established.
Fast Objective and Duality Gap Convergence for Non-convex Strongly-concave Min-max Problems
TLDR
This paper considers leveraging the Polyak-Łojasiewicz (PL) condition to design faster stochastic algorithms with stronger convergence guarantee, and proposes and analyzes proximal epoch-based methods that establish fast convergence in terms of both primal objective gap and duality gap.

References

SHOWING 1-10 OF 31 REFERENCES
What is Local Optimality in Nonconvex-Nonconcave Minimax Optimization?
TLDR
A proper mathematical definition of local optimality for this sequential setting---local minimax is proposed, as well as its properties and existence results are presented.
Sub-sampled Cubic Regularization for Non-convex Optimization
TLDR
This work provides a sampling scheme that gives sufficiently accurate gradient and Hessian approximations to retain the strong global and local convergence guarantees of cubically regularized methods, and is the first work that gives global convergence guarantees for a sub-sampled variant of cubic regularization on non-convex functions.
Escaping From Saddle Points - Online Stochastic Gradient for Tensor Decomposition
TLDR
This paper identifies strict saddle property for non-convex problem that allows for efficient optimization of orthogonal tensor decomposition, and shows that stochastic gradient descent converges to a local minimum in a polynomial number of iterations.
The Complexity of Nonconvex-Strongly-Concave Minimax Optimization
TLDR
A generic acceleration scheme that deploys existing gradient-based methods to solve a sequence of crafted strongly-convex-strongly-concave subproblems and removes an additional polylogarithmic dependence on accuracy present in previous works is introduced.
Stochastic Cubic Regularization for Fast Nonconvex Optimization
TLDR
The proposed algorithm efficiently escapes saddle points and finds approximate local minima for general smooth, nonconvex functions in only $\mathcal{\tilde{O}}(\epsilon^{-3.5})$ stochastic gradient and stochastically Hessian-vector product evaluations.
A Stochastic Proximal Point Algorithm for Saddle-Point Problems
TLDR
A stochastic proximal point algorithm, which accelerates the variance reduction method SAGA for saddle point problems and adopts the algorithm to policy evaluation and the empirical results show that the method is much more efficient than state-of-the-art methods.
Stochastic Variance-Reduced Cubic Regularization Methods
TLDR
A stochastic variance-reduced cubic regularized Newton method (SVRC) for non-convex optimization, which is guaranteed to converge to an-approximate local minimum within Õ(n/ ) second-order oracle calls, which outperforms the state-of-the-art cubic regularization algorithms including subsampled cubicRegularization.
Single-Timescale Stochastic Nonconvex-Concave Optimization for Smooth Nonlinear TD Learning
TLDR
This paper proposes two single-timescale single-loop algorithms which require only one data point each step and implements momentum updates on both primal and dual variables achieving an $O(\varepsilon^{-4})$ sample complexity, which shows the important role of momentum in obtaining a single- Timescale algorithm.
On the Convergence and Robustness of Training GANs with Regularized Optimal Transport
TLDR
This work shows that obtaining gradient information of the smoothed Wasserstein GAN formulation, which is based on regularized Optimal Transport (OT), is computationally effortless and hence one can apply first order optimization methods to minimize this objective.
Finding approximate local minima faster than gradient descent
We design a non-convex second-order optimization algorithm that is guaranteed to return an approximate local minimum in time which scales linearly in the underlying dimension and the number of
...
1
2
3
4
...