Corpus ID: 12146655

# On Graduated Optimization for Stochastic Non-Convex Problems

@article{Hazan2016OnGO,
title={On Graduated Optimization for Stochastic Non-Convex Problems},
author={Elad Hazan and Kfir Yehuda Levy and Shai Shalev-Shwartz},
journal={ArXiv},
year={2016},
volume={abs/1503.03712}
}
• Published 2016
• Computer Science, Mathematics
• ArXiv
The graduated optimization approach, also known as the continuation method, is a popular heuristic to solving non-convex problems that has received renewed interest over the last decade. Despite its popularity, very little is known in terms of theoretical convergence analysis. In this paper we describe a new first-order algorithm based on graduated optimiza- tion and analyze its performance. We characterize a parameterized family of non- convex functions for which this algorithm provably… Expand
• Computer Science, Mathematics
• ArXiv
• 2017
Experimental results illustrate that the new algorithms with the similar performance can converge to 'global' optimums of the nonconvex problems, and they converge faster than the GradOpt and the non Convex proximal SVRG. Expand
SVRG for a non-convex problem using graduated optimization algorithm
• Computer Science
• J. Intell. Fuzzy Syst.
• 2018
Experimental results illustrate that two new algorithms with similar performance can converge to the "global" optima of a non-convex problem comparatively faster than GradOpt or non- Convex Prox-SVRG. Expand
Convergence Analysis of Homotopy-SGD for non-convex optimization
• Computer Science, Mathematics
• ArXiv
• 2020
This work presents a first-order stochastic algorithm based on a combination of homotopy methods and SGD, called Homotopy-Stochastic Gradient Descent (H-SGD), which finds interesting connections with some proposed heuristics in the literature, e.g. optimization by Gaussian continuation, training by diffusion, mollifying networks. Expand
GS-OPT: A new fast stochastic algorithm for solving the non-convex optimization problem
• Computer Science
• 2020
A new algorithm namely GS-OPT (General Stochastic OPTimization) is proposed which is effective for solving the non-convex problems by combining two stochastic bounds of the objective function made by a commonly discrete probability distribution namely Bernoulli. Expand
• Mathematics, Computer Science
• IEEE Transactions on Signal Processing
• 2018
Somewhat surprisingly, the decentralized consensus algorithms, DGD and Prox-DGD, retain most other properties that are known in the convex setting, and can take the constraint to a nonconvex set with an easy projection. Expand
Convex Optimization with Unbounded Nonconvex Oracles using Simulated Annealing
• Computer Science, Mathematics
• COLT
• 2018
This paper studies the more general case when the noise has magnitude $\alpha F(x) + \beta$ for some $\alpha, \beta > 0$ and presents a polynomial time algorithm that finds an approximate minimizer of $F$ for this noise model. Expand
Non-convex learning via Stochastic Gradient Langevin Dynamics: a nonasymptotic analysis
• Mathematics, Computer Science
• COLT
• 2017
The present work provides a nonasymptotic analysis in the context of non-convex learning problems, giving finite-time guarantees for SGLD to find approximate minimizers of both empirical and population risks. Expand
Convex Relaxation Regression: Black-Box Optimization of Smooth Functions by Learning Their Convex Envelopes
• Computer Science, Mathematics
• UAI
• 2016
Convex Relaxation Regression (CoRR) is introduced, an approach for learning convex relaxations for a class of smooth functions and it is proved that with probability greater than $1-\delta$, the solution of the algorithm converges to the global optimizer of $f$ with error. Expand
Homotopy Analysis for Tensor PCA
• Mathematics, Computer Science
• COLT
• 2017
The class of homotopy or continuation methods for global optimization of nonconvex functions, which start from an objective function that is efficient to optimize, and progressively modify it to obtain the required objective, and the solutions are passed along the Homotopy path. Expand
DynaNewton - Accelerating Newton's Method for Machine Learning
• Computer Science
• ArXiv
• 2016
A novel continuation method, where a family of objectives is defined over increasing sample sizes and with decreasing regularization strength, to accelerate Newton's method, namely, subsampling training data and increasing strong convexity through regularization. Expand

#### References

SHOWING 1-10 OF 26 REFERENCES
Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization
• Mathematics, Computer Science
• ICML
• 2012
This paper investigates the optimality of SGD in a stochastic setting, and shows that for smooth problems, the algorithm attains the optimal O(1/T) rate, however, for non-smooth problems the convergence rate with averaging might really be Ω(log(T)/T), and this is not just an artifact of the analysis. Expand
Identifying and attacking the saddle point problem in high-dimensional non-convex optimization
• Computer Science, Mathematics
• NIPS
• 2014
This paper proposes a new approach to second-order optimization, the saddle-free Newton method, that can rapidly escape high dimensional saddle points, unlike gradient descent and quasi-Newton methods, and applies this algorithm to deep or recurrent neural network training, and provides numerical evidence for its superior optimization performance. Expand
A Theoretical Analysis of Optimization by Gaussian Continuation
• Computer Science
• AAAI
• 2015
A theoretical analysis is provided that provides a bound on the end-point solution of the continuation method and it is shown that this characteristic can be analytically computed when the objective function is expressed in some suitable basis functions. Expand
On the Link between Gaussian Homotopy Continuation and Convex Envelopes
• Mathematics, Computer Science
• EMMCVPR
• 2014
It is proved that Gaussian smoothing emerges from the best affine approximation to Vese’s nonlinear PDE, hence providing the optimal convexification. Expand
A Path Following Algorithm for the Graph Matching Problem
• Computer Science, Medicine
• IEEE Transactions on Pattern Analysis and Machine Intelligence
• 2009
This method allows to easily integrate the information on graph label similarities into the optimization problem, and therefore, perform labeled weighted graph matching and is compared with some of the best performing graph matching methods on four data sets. Expand
A GNC Algorithm for Deblurring Images with Interacting Discontinuities
In this paper we present a Graduated Non-Convexity (GNC) algorithm for reconstructing images. We assume that the data images are blurred and corrupted by white Gaussian noise. Geometric features ofExpand
Online convex optimization in the bandit setting: gradient descent without a gradient
• Mathematics, Computer Science
• SODA '05
• 2005
It is possible to use gradient descent without seeing anything more than the value of the functions at a single point, and the guarantees hold even in the most general case: online against an adaptive adversary. Expand
Smoothing a Program Soundly and Robustly
• Computer Science
• CAV
• 2011
The foundations of smooth interpretation are studied, a recently-proposed program approximation scheme that facilitates the use of local numerical search techniques in program analysis and synthesis and the design of an approximate smooth interpreter is described that provably satisfies soundness, robustness, and β-robustness. Expand
Numerical continuation methods - an introduction
• Mathematics, Computer Science
• Springer series in computational mathematics
• 1990
This book discusses Continuation Methods, Newton's Method and Orthogonal Decompositions Revisited, and Update Methods and their Numerical Stability. Expand
The Effective Energy Transformation Scheme as a Special Continuation Approach to Global Optimization with Application to Molecular Conformation
• Z. Wu
• Mathematics, Computer Science
• SIAM J. Optim.
• 1996
This paper discusses a generalization of the function transformation scheme used in Coleman, Shalloway, and Wu for global energy minimization applied to the molecular conformation problem and shows that the method can transform a nonlinear objective function into a class of gradually deformed, but smoother'' or easier'' functions. Expand