• Corpus ID: 221970024

Escaping Saddle-Points Faster under Interpolation-like Conditions

@article{Roy2020EscapingSF,
title={Escaping Saddle-Points Faster under Interpolation-like Conditions},
author={Abhishek Roy and Krishnakumar Balasubramanian and Saeed Ghadimi and Prasant Mohapatra},
journal={ArXiv},
year={2020},
volume={abs/2009.13016}
}
• Published 28 September 2020
• Computer Science
• ArXiv
In this paper, we show that under over-parametrization several standard stochastic optimization algorithms escape saddle-points and converge to local-minimizers much faster. One of the fundamental aspects of over-parametrized models is that they are capable of interpolating the training data. We show that, under interpolation-like assumptions satisfied by the stochastic gradients in an over-parametrization setting, the first-order oracle complexity of Perturbed Stochastic Gradient Descent (PSGD…
2 Citations

Tables from this paper

Zeroth-order algorithms for nonconvex–strongly-concave minimax problems with improved complexities

• Computer Science
Journal of Global Optimization
• 2022
This paper designs and analyzes a zeroth-order algorithm for minimax optimization problems that are nonconvex in one variable and strongly-concave in the other variable, and proposes the Zeroth-Order Gradient Descent Multi-Step Ascent algorithm, which improves the oracle complexity of ZO-GDA.

Zeroth-Order Algorithms for Nonconvex Minimax Problems with Improved Complexities

• Computer Science
ArXiv
• 2020
This paper design and analyze the Zeroth-Order Gradient Descent Ascent algorithm, and provides improved results compared to existing works, in terms of oracle complexity, and proposes a new algorithm that significantly improves the oracles complexity of the algorithm.

References

SHOWING 1-10 OF 69 REFERENCES

Sharp Analysis for Nonconvex SGD Escaping from Saddle Points

• Computer Science
COLT
• 2019
A sharp analysis for Stochastic Gradient Descent is given and it is proved that SGD is able to efficiently escape from saddle points and find an approximate second-order stationary point in $\tilde{O}(\epsilon^{-3.5}))$ stochastic gradient computations for generic nonconvex optimization problems, when the objective function satisfies gradient-Lipschitz, Hessian-Lipitz, and dispersive noise assumptions.

Escaping Saddle Points in Constrained Optimization

• Computer Science, Mathematics
NeurIPS
• 2018
This paper proposes a generic framework that yields convergence to a second-order stationary point of the problem, if the convex set $\mathcal{C}$ is simple for a quadratic objective function.

• Computer Science
2020 54th Annual Conference on Information Sciences and Systems (CISS)
• 2020
It is shown that the proposed model-free non-convex optimization algorithm returns an ε-second-order stationary point with queries of the function for any arbitrary θ > 0.

Stochastic Cubic Regularization for Fast Nonconvex Optimization

• Computer Science, Mathematics
NeurIPS
• 2018
The proposed algorithm efficiently escapes saddle points and finds approximate local minima for general smooth, nonconvex functions in only $\mathcal{\tilde{O}}(\epsilon^{-3.5})$ stochastic gradient and stochastically Hessian-vector product evaluations.

Gradient Descent Efficiently Finds the Cubic-Regularized Non-Convex Newton Step

• Mathematics, Computer Science
ArXiv
• 2016
We consider the minimization of non-convex quadratic forms regularized by a cubic term, which exhibit multiple saddle points and poor local minima. Nonetheless, we prove that, under mild assumptions,

• Computer Science, Mathematics
NeurIPS
• 2019
This work establishes asymptotic convergence to second order stationary points using a carefully tailored application of the Stable Manifold Theorem to derivative-free algorithms for non-convex optimization that use only function evaluations rather than gradients.

Fast and Furious Convergence: Stochastic Second Order Methods under Interpolation

• Computer Science, Mathematics
AISTATS
• 2020
Stochastic second-order methods for minimizing smooth and strongly-convex functions under an interpolation condition satisfied by over-parameterized models are considered and the regularized subsampled Newton method (R-SSN) achieves global linear convergence with an adaptive step-size and a constant batch-size.

Random Gradient-Free Minimization of Convex Functions

• Computer Science, Mathematics
Found. Comput. Math.
• 2017
New complexity bounds for methods of convex optimization based only on computation of the function value are proved, which appears that such methods usually need at most n times more iterations than the standard gradient methods, where n is the dimension of the space of variables.

Identifying and attacking the saddle point problem in high-dimensional non-convex optimization

• Computer Science
NIPS
• 2014
This paper proposes a new approach to second-order optimization, the saddle-free Newton method, that can rapidly escape high dimensional saddle points, unlike gradient descent and quasi-Newton methods, and applies this algorithm to deep or recurrent neural network training, and provides numerical evidence for its superior optimization performance.

SPIDER: Near-Optimal Non-Convex Optimization via Stochastic Path Integrated Differential Estimator

• Computer Science, Mathematics
NeurIPS
• 2018
This paper proposes a new technique named SPIDER, which can be used to track many deterministic quantities of interest with significantly reduced computational cost and proves that SPIDER-SFO nearly matches the algorithmic lower bound for finding approximate first-order stationary points under the gradient Lipschitz assumption in the finite-sum setting.