• Corpus ID: 221970024

Escaping Saddle-Points Faster under Interpolation-like Conditions

@article{Roy2020EscapingSF,
  title={Escaping Saddle-Points Faster under Interpolation-like Conditions},
  author={Abhishek Roy and Krishnakumar Balasubramanian and Saeed Ghadimi and Prasant Mohapatra},
  journal={ArXiv},
  year={2020},
  volume={abs/2009.13016}
}
In this paper, we show that under over-parametrization several standard stochastic optimization algorithms escape saddle-points and converge to local-minimizers much faster. One of the fundamental aspects of over-parametrized models is that they are capable of interpolating the training data. We show that, under interpolation-like assumptions satisfied by the stochastic gradients in an over-parametrization setting, the first-order oracle complexity of Perturbed Stochastic Gradient Descent (PSGD… 
2 Citations

Tables from this paper

Zeroth-order algorithms for nonconvex–strongly-concave minimax problems with improved complexities

TLDR
This paper designs and analyzes a zeroth-order algorithm for minimax optimization problems that are nonconvex in one variable and strongly-concave in the other variable, and proposes the Zeroth-Order Gradient Descent Multi-Step Ascent algorithm, which improves the oracle complexity of ZO-GDA.

Zeroth-Order Algorithms for Nonconvex Minimax Problems with Improved Complexities

TLDR
This paper design and analyze the Zeroth-Order Gradient Descent Ascent algorithm, and provides improved results compared to existing works, in terms of oracle complexity, and proposes a new algorithm that significantly improves the oracles complexity of the algorithm.

References

SHOWING 1-10 OF 69 REFERENCES

Sharp Analysis for Nonconvex SGD Escaping from Saddle Points

TLDR
A sharp analysis for Stochastic Gradient Descent is given and it is proved that SGD is able to efficiently escape from saddle points and find an approximate second-order stationary point in $\tilde{O}(\epsilon^{-3.5}))$ stochastic gradient computations for generic nonconvex optimization problems, when the objective function satisfies gradient-Lipschitz, Hessian-Lipitz, and dispersive noise assumptions.

Escaping Saddle Points in Constrained Optimization

TLDR
This paper proposes a generic framework that yields convergence to a second-order stationary point of the problem, if the convex set $\mathcal{C}$ is simple for a quadratic objective function.

Escaping Saddle Points for Zeroth-order Non-convex Optimization using Estimated Gradient Descent

TLDR
It is shown that the proposed model-free non-convex optimization algorithm returns an ε-second-order stationary point with queries of the function for any arbitrary θ > 0.

Stochastic Cubic Regularization for Fast Nonconvex Optimization

TLDR
The proposed algorithm efficiently escapes saddle points and finds approximate local minima for general smooth, nonconvex functions in only $\mathcal{\tilde{O}}(\epsilon^{-3.5})$ stochastic gradient and stochastically Hessian-vector product evaluations.

Gradient Descent Efficiently Finds the Cubic-Regularized Non-Convex Newton Step

We consider the minimization of non-convex quadratic forms regularized by a cubic term, which exhibit multiple saddle points and poor local minima. Nonetheless, we prove that, under mild assumptions,

Efficiently avoiding saddle points with zero order methods: No gradients required

TLDR
This work establishes asymptotic convergence to second order stationary points using a carefully tailored application of the Stable Manifold Theorem to derivative-free algorithms for non-convex optimization that use only function evaluations rather than gradients.

Fast and Furious Convergence: Stochastic Second Order Methods under Interpolation

TLDR
Stochastic second-order methods for minimizing smooth and strongly-convex functions under an interpolation condition satisfied by over-parameterized models are considered and the regularized subsampled Newton method (R-SSN) achieves global linear convergence with an adaptive step-size and a constant batch-size.

Random Gradient-Free Minimization of Convex Functions

TLDR
New complexity bounds for methods of convex optimization based only on computation of the function value are proved, which appears that such methods usually need at most n times more iterations than the standard gradient methods, where n is the dimension of the space of variables.

Identifying and attacking the saddle point problem in high-dimensional non-convex optimization

TLDR
This paper proposes a new approach to second-order optimization, the saddle-free Newton method, that can rapidly escape high dimensional saddle points, unlike gradient descent and quasi-Newton methods, and applies this algorithm to deep or recurrent neural network training, and provides numerical evidence for its superior optimization performance.

SPIDER: Near-Optimal Non-Convex Optimization via Stochastic Path Integrated Differential Estimator

TLDR
This paper proposes a new technique named SPIDER, which can be used to track many deterministic quantities of interest with significantly reduced computational cost and proves that SPIDER-SFO nearly matches the algorithmic lower bound for finding approximate first-order stationary points under the gradient Lipschitz assumption in the finite-sum setting.
...