# Escaping Saddle-Points Faster under Interpolation-like Conditions

@article{Roy2020EscapingSF, title={Escaping Saddle-Points Faster under Interpolation-like Conditions}, author={Abhishek Roy and Krishnakumar Balasubramanian and Saeed Ghadimi and Prasant Mohapatra}, journal={ArXiv}, year={2020}, volume={abs/2009.13016} }

In this paper, we show that under over-parametrization several standard stochastic optimization algorithms escape saddle-points and converge to local-minimizers much faster. One of the fundamental aspects of over-parametrized models is that they are capable of interpolating the training data. We show that, under interpolation-like assumptions satisfied by the stochastic gradients in an over-parametrization setting, the first-order oracle complexity of Perturbed Stochastic Gradient Descent (PSGD…

## Tables from this paper

## 2 Citations

### Zeroth-order algorithms for nonconvex–strongly-concave minimax problems with improved complexities

- Computer ScienceJournal of Global Optimization
- 2022

This paper designs and analyzes a zeroth-order algorithm for minimax optimization problems that are nonconvex in one variable and strongly-concave in the other variable, and proposes the Zeroth-Order Gradient Descent Multi-Step Ascent algorithm, which improves the oracle complexity of ZO-GDA.

### Zeroth-Order Algorithms for Nonconvex Minimax Problems with Improved Complexities

- Computer ScienceArXiv
- 2020

This paper design and analyze the Zeroth-Order Gradient Descent Ascent algorithm, and provides improved results compared to existing works, in terms of oracle complexity, and proposes a new algorithm that significantly improves the oracles complexity of the algorithm.

## References

SHOWING 1-10 OF 69 REFERENCES

### Sharp Analysis for Nonconvex SGD Escaping from Saddle Points

- Computer ScienceCOLT
- 2019

A sharp analysis for Stochastic Gradient Descent is given and it is proved that SGD is able to efficiently escape from saddle points and find an approximate second-order stationary point in $\tilde{O}(\epsilon^{-3.5}))$ stochastic gradient computations for generic nonconvex optimization problems, when the objective function satisfies gradient-Lipschitz, Hessian-Lipitz, and dispersive noise assumptions.

### Escaping Saddle Points in Constrained Optimization

- Computer Science, MathematicsNeurIPS
- 2018

This paper proposes a generic framework that yields convergence to a second-order stationary point of the problem, if the convex set $\mathcal{C}$ is simple for a quadratic objective function.

### Escaping Saddle Points for Zeroth-order Non-convex Optimization using Estimated Gradient Descent

- Computer Science2020 54th Annual Conference on Information Sciences and Systems (CISS)
- 2020

It is shown that the proposed model-free non-convex optimization algorithm returns an ε-second-order stationary point with queries of the function for any arbitrary θ > 0.

### Stochastic Cubic Regularization for Fast Nonconvex Optimization

- Computer Science, MathematicsNeurIPS
- 2018

The proposed algorithm efficiently escapes saddle points and finds approximate local minima for general smooth, nonconvex functions in only $\mathcal{\tilde{O}}(\epsilon^{-3.5})$ stochastic gradient and stochastically Hessian-vector product evaluations.

### Gradient Descent Efficiently Finds the Cubic-Regularized Non-Convex Newton Step

- Mathematics, Computer ScienceArXiv
- 2016

We consider the minimization of non-convex quadratic forms regularized by a cubic term, which exhibit multiple saddle points and poor local minima. Nonetheless, we prove that, under mild assumptions,…

### Efficiently avoiding saddle points with zero order methods: No gradients required

- Computer Science, MathematicsNeurIPS
- 2019

This work establishes asymptotic convergence to second order stationary points using a carefully tailored application of the Stable Manifold Theorem to derivative-free algorithms for non-convex optimization that use only function evaluations rather than gradients.

### Fast and Furious Convergence: Stochastic Second Order Methods under Interpolation

- Computer Science, MathematicsAISTATS
- 2020

Stochastic second-order methods for minimizing smooth and strongly-convex functions under an interpolation condition satisfied by over-parameterized models are considered and the regularized subsampled Newton method (R-SSN) achieves global linear convergence with an adaptive step-size and a constant batch-size.

### Random Gradient-Free Minimization of Convex Functions

- Computer Science, MathematicsFound. Comput. Math.
- 2017

New complexity bounds for methods of convex optimization based only on computation of the function value are proved, which appears that such methods usually need at most n times more iterations than the standard gradient methods, where n is the dimension of the space of variables.

### Identifying and attacking the saddle point problem in high-dimensional non-convex optimization

- Computer ScienceNIPS
- 2014

This paper proposes a new approach to second-order optimization, the saddle-free Newton method, that can rapidly escape high dimensional saddle points, unlike gradient descent and quasi-Newton methods, and applies this algorithm to deep or recurrent neural network training, and provides numerical evidence for its superior optimization performance.

### SPIDER: Near-Optimal Non-Convex Optimization via Stochastic Path Integrated Differential Estimator

- Computer Science, MathematicsNeurIPS
- 2018

This paper proposes a new technique named SPIDER, which can be used to track many deterministic quantities of interest with significantly reduced computational cost and proves that SPIDER-SFO nearly matches the algorithmic lower bound for finding approximate first-order stationary points under the gradient Lipschitz assumption in the finite-sum setting.