• Corpus ID: 2976644

Fast Stochastic Methods for Nonsmooth Nonconvex Optimization

@article{Reddi2016FastSM,
  title={Fast Stochastic Methods for Nonsmooth Nonconvex Optimization},
  author={Sashank J. Reddi and Suvrit Sra and Barnab{\'a}s P{\'o}czos and Alex Smola},
  journal={ArXiv},
  year={2016},
  volume={abs/1605.06900}
}
We analyze stochastic algorithms for optimizing nonconvex, nonsmooth finite-sum problems, where the nonconvex part is smooth and the nonsmooth part is convex. Surprisingly, unlike the smooth case, our knowledge of this fundamental problem is very limited. For example, it is not known whether the proximal stochastic gradient method with constant minibatch converges to a stationary point. To tackle this issue, we develop fast stochastic algorithms that provably converge to a stationary point for… 

Figures and Tables from this paper

Proximal Stochastic Methods for Nonsmooth Nonconvex Finite-Sum Optimization
TLDR
This work develops fast stochastic algorithms that provably converge to a stationary point for constant minibatches and proves global linear convergence rate for an interesting subclass of nonsmooth nonconvex functions, which subsumes several recent works.
Fast incremental method for smooth nonconvex optimization
TLDR
This paper analyzes the SAGA algorithm within an Incremental First-order Oracle framework, and shows that it converges to a stationary point provably faster than both gradient descent and stochastic gradient descent.
Linear Convergence of Accelerated Stochastic Gradient Descent for Nonconvex Nonsmooth Optimization
TLDR
It is first proved that the accelerated SGD method converges linearly to the local minimum of the nonconvex optimization.
Stochastic Variance Reduction for Nonconvex Optimization
TLDR
This work proves non-asymptotic rates of convergence of SVRG for nonconvex optimization, and shows that it is provably faster than SGD and gradient descent.
Stochastic Frank-Wolfe methods for nonconvex optimization
TLDR
For objective functions that decompose into a finite-sum, ideas from variance reduction for convex optimization are leveraged to obtain new variance reduced nonconvex Frank- Wolfe methods that have provably faster convergence than the classical Frank-Wolfe method.
Stochastic proximal quasi-Newton methods for non-convex composite optimization
TLDR
This paper proposes a generic algorithmic framework for stochastic proximal quasi-Newton (SPQN) methods to solve non-convex composite optimization problems and proposes a modified self-scaling symmetric rank one incorporated in the framework for SPQN method, which is called Stochastic symmetricRank one method.
Efficient Learning with a Family of Nonconvex Regularizers by Redistributing Nonconvexity
TLDR
This paper proposes to move the nonconvexity from the regularizer to the loss, which is then transformed to a familiar convex regularizer, while the resultant loss function can still be guaranteed to be smooth.
Adaptive Methods for Nonconvex Optimization
TLDR
The result implies that increasing minibatch sizes enables convergence, thus providing a way to circumvent the non-convergence issues, and provides a new adaptive optimization algorithm, Yogi, which controls the increase in effective learning rate, leading to even better performance with similar theoretical guarantees on convergence.
Accelerated Method for Stochastic Composition Optimization with Nonsmooth Regularization
TLDR
This paper proposes a new stochastic composition optimization method for composition problem with nonsmooth regularization penalty that significantly improves the state-of-the-art convergence rate from O(T–1/2) to O((n1+n2)2/3T-1).
Mini-Batch Stochastic ADMMs for Nonconvex Nonsmooth Optimization
TLDR
It is proved that, given an appropriatemini-batch size, the mini-batch stochastic ADMM without variance reduction (VR) technique is convergent and reaches a convergence rate of O(1/T) to obtain a stationary point of the nonconvex optimization, where $T$ denotes the number of iterations.
...
...

References

SHOWING 1-10 OF 46 REFERENCES
Fast Incremental Method for Nonconvex Optimization
TLDR
This paper analyzes the SAGA algorithm within an Incremental First-order Oracle framework, and shows that it converges to a stationary point provably faster than both gradient descent and stochastic gradient descent.
Stochastic First- and Zeroth-Order Methods for Nonconvex Stochastic Programming
TLDR
This paper discusses a variant of the algorithm which consists of applying a post-optimization phase to evaluate a short list of solutions generated by several independent runs of the RSG method, and shows that such modification allows to improve significantly the large-deviation properties of the algorithms.
Stochastic Variance Reduction for Nonconvex Optimization
TLDR
This work proves non-asymptotic rates of convergence of SVRG for nonconvex optimization, and shows that it is provably faster than SGD and gradient descent.
Scalable nonconvex inexact proximal splitting
  • S. Sra
  • Computer Science, Mathematics
    NIPS
  • 2012
TLDR
This work is first to develop and analyze incremental nonconvex proximal-splitting algorithms, and introduces a powerful new framework based on asymptotically nonvanishing errors, avoiding the common stronger assumption of vanishing errors.
Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization
TLDR
A randomized stochastic projected gradient (RSPG) algorithm, in which proper mini-batch of samples are taken at each iteration depending on the total budget of Stochastic samples allowed, is proposed, which shows nearly optimal complexity of the algorithm for convex stoChastic programming.
Variance Reduction for Faster Non-Convex Optimization
TLDR
This work considers the fundamental problem in non-convex optimization of efficiently reaching a stationary point, and proposes a first-order minibatch stochastic method that converges with an $O(1/\varepsilon)$ rate, and is faster than full gradient descent by $\Omega(n^{1/3})$.
Stochastic Variance Reduced Optimization for Nonconvex Sparse Learning
We propose a stochastic variance reduced optimization algorithm for solving a class of large-scale nonconvex optimization problems with cardinality constraints, and provide sufficient conditions
A Proximal Stochastic Gradient Method with Progressive Variance Reduction
TLDR
This work proposes and analyzes a new proximal stochastic gradient method, which uses a multistage scheme to progressively reduce the variance of the stochastics gradient.
Linear Convergence of Proximal-Gradient Methods under the Polyak-Łojasiewicz Condition
TLDR
This work shows that the Polyak-Łojasiewicz inequality is actually weaker than the four main conditions that have been explored to show linear convergence rates without strong-convexity over the last 25 years, and considers a natural generalization of the inequality that applies to proximal-gradient methods for non-smooth optimization.
Asynchronous Stochastic Coordinate Descent: Parallelism and Convergence Properties
TLDR
An asynchronous parallel stochastic proximal coordinate descent algorithm for minimizing a composite objective function, which consists of a smooth convex function added to a separable conveX function, achieves a linear convergence rate on functions that satisfy an optimal strong convexity property and a sublinear rate on general convex functions.
...
...