# When Are Nonconvex Problems Not Scary?

@article{Sun2015WhenAN, title={When Are Nonconvex Problems Not Scary?}, author={Ju Sun and Qing Qu and John Wright}, journal={ArXiv}, year={2015}, volume={abs/1510.06096} }

In this note, we focus on smooth nonconvex optimization problems that obey: (1) all local minimizers are also global; and (2) around any saddle point or local maximizer, the objective has a negative directional curvature. Concrete applications such as dictionary learning, generalized phase retrieval, and orthogonal tensor decomposition are known to induce such structures. We describe a second-order trust-region algorithm that provably converges to a global minimizer efficiently, without special…

## Figures and Topics from this paper

## 146 Citations

Why Do Local Methods Solve Nonconvex Problems?

- Computer Science, MathematicsBeyond the Worst-Case Analysis of Algorithms
- 2020

This work hypothesizes a unified explanation for this intriguing phenomenon: most of the local minima of the practically-used objectives are approximately global minima and rigorously formalizes it for concrete instances of machine learning problems.

No Spurious Local Minima in Nonconvex Low Rank Problems: A Unified Geometric Analysis

- Computer Science, MathematicsICML
- 2017

A new framework that captures the common landscape underlying the common non-convex low-rank matrix problems including matrix sensing, matrix completion and robust PCA shows that all local minima are also globally optimal; no high-order saddle points exists.

When Are Nonconvex Optimization Problems Not Scary

- Computer Science
- 2016

This talk will highlight a family of nonconvex problems that can be solved to global optimality using simple numerical methods, independent of initialization, and has the characteristic global structure that all local minimizers are global, and all saddle points have negative directional curvatures.

The Geometric Effects of Distributing Constrained Nonconvex Optimization Problems

- Computer Science2019 IEEE 8th International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP)
- 2019

It is shown that the first/second-order stationary points of the centralized and distributed problems are one-to-one correspondent, implying that the distributed problem—in spite of its additional variables and constraints—can inherit the benign geometry of its centralized counterpart.

Cubic Regularized ADMM with Convergence to a Local Minimum in Non-convex Optimization

- Computer Science, Mathematics2019 57th Annual Allerton Conference on Communication, Control, and Computing (Allerton)
- 2019

This paper proposes Cubic Regularized Alternating Direction Method of Multipliers to escape saddle points of separable non-convex functions containing a non-Hessian-Lipschitz component and proves that CR-ADMM converges to a local minimum of the original function with a rate of O(1 /T^{1/3})$ in time horizon T, which is faster than gradient-based methods.

Efficient Dictionary Learning with Gradient Descent

- Computer Science, MathematicsICML
- 2019

This work provides converge guarantees for randomly initialized gradient descent to the neighborhood of a global optimum and provides evidence that this feature is shared by other nonconvex problems of importance as well.

Active strict saddles in nonsmooth optimization

- Mathematics, Computer ScienceArXiv
- 2019

It is argued that the strict saddle property may be a realistic assumption in applications, since it provably holds for generic semi-algebraic optimization problems.

Efficient approaches for escaping higher order saddle points in non-convex optimization

- Computer Science, MathematicsCOLT
- 2016

This paper designs the first efficient algorithm guaranteed to converge to a third order local optimum (while existing techniques are at most second order), and shows that it is NP-hard to extend this further to finding fourth order local optima.

Sub-sampled Cubic Regularization for Non-convex Optimization

- Computer Science, MathematicsICML
- 2017

This work provides a sampling scheme that gives sufficiently accurate gradient and Hessian approximations to retain the strong global and local convergence guarantees of cubically regularized methods, and is the first work that gives global convergence guarantees for a sub-sampled variant of cubic regularization on non-convex functions.

Proximal Methods Avoid Active Strict Saddles of Weakly Convex Functions

- MathematicsFoundations of Computational Mathematics
- 2021

We introduce a geometrically transparent strict saddle property for nonsmooth functions. This property guarantees that simple proximal algorithms on weakly convex problems converge only to local…

## References

SHOWING 1-10 OF 89 REFERENCES

Efficient approaches for escaping higher order saddle points in non-convex optimization

- Computer Science, MathematicsCOLT
- 2016

This paper designs the first efficient algorithm guaranteed to converge to a third order local optimum (while existing techniques are at most second order), and shows that it is NP-hard to extend this further to finding fourth order local optima.

Escaping From Saddle Points - Online Stochastic Gradient for Tensor Decomposition

- Mathematics, Computer ScienceCOLT
- 2015

This paper identifies strict saddle property for non-convex problem that allows for efficient optimization of orthogonal tensor decomposition, and shows that stochastic gradient descent converges to a local minimum in a polynomial number of iterations.

The trust region subproblem and semidefinite programming

- Mathematics, Computer ScienceOptim. Methods Softw.
- 2004

This article provides an in depth study of TRS and its properties as well as a survey of recent advances, providing both theoretical and empirical evidence to illustrate the strength of the SDP and duality approach.

On the low-rank approach for semidefinite programs arising in synchronization and community detection

- Mathematics, Computer ScienceCOLT
- 2016

This work focuses on Synchronization and Community Detection problems and provides theoretical guarantees shedding light on the remarkable efficiency of the heuristic proposed over a decade ago.

Local Minima and Convergence in Low-Rank Semidefinite Programming

- Mathematics, Computer ScienceMath. Program.
- 2005

The local minima of LRSDPr are classified and the optimal convergence of a slight variant of the successful, yet experimental, algorithm of Burer and Monteiro is proved, which handles L RSDPr via the nonconvex change of variables X=RRT.

A Convergent Gradient Descent Algorithm for Rank Minimization and Semidefinite Programming from Random Linear Measurements

- Computer Science, MathematicsNIPS
- 2015

We propose a simple, scalable, and fast gradient descent algorithm to optimize a nonconvex objective for the rank minimization problem and a closely related family of semidefinite programs. With…

A nonlinear programming algorithm for solving semidefinite programs via low-rank factorization

- Mathematics, Computer ScienceMath. Program.
- 2003

A nonlinear programming algorithm for solving semidefinite programs (SDPs) in standard form that replaces the symmetric, positive semideFinite variable X with a rectangular variable R according to the factorization X=RRT.

Guaranteed Matrix Completion via Nonconvex Factorization

- Computer Science2015 IEEE 56th Annual Symposium on Foundations of Computer Science
- 2015

This paper establishes a theoretical guarantee for the factorization based formulation to correctly recover the underlying low-rank matrix, and is the first one that provides exact recovery guarantee for many standard algorithms such as gradient descent, SGD and block coordinate gradient descent.

Identifying and attacking the saddle point problem in high-dimensional non-convex optimization

- Computer Science, MathematicsNIPS
- 2014

This paper proposes a new approach to second-order optimization, the saddle-free Newton method, that can rapidly escape high dimensional saddle points, unlike gradient descent and quasi-Newton methods, and applies this algorithm to deep or recurrent neural network training, and provides numerical evidence for its superior optimization performance.

Some NP-complete problems in quadratic and nonlinear programming

- Mathematics, Computer ScienceMath. Program.
- 1987

A special class of indefinite quadratic programs is constructed, with simple constraints and integer data, and it is shown that checking (a) or (b) on this class is NP-complete.