• Corpus ID: 15824822

Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization

@article{Rakhlin2012MakingGD,
  title={Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization},
  author={Alexander Rakhlin and Ohad Shamir and Karthik Sridharan},
  journal={ArXiv},
  year={2012},
  volume={abs/1109.5647}
}
Stochastic gradient descent (SGD) is a simple and popular method to solve stochastic optimization problems which arise in machine learning. For strongly convex problems, its convergence rate was known to be O(log(T)/T), by running SGD for T iterations and returning the average point. However, recent results showed that using a different algorithm, one can get an optimal O(1/T) rate. This might lead one to believe that standard SGD is suboptimal, and maybe should even be replaced as a method of… 

Figures from this paper

Stochastic Gradient Descent for Non-smooth Optimization: Convergence Results and Optimal Averaging Schemes
TLDR
The performance of SGD without non-trivial smoothness assumptions is investigated, as well as a running average scheme to convert the SGD iterates to a solution with optimal optimization accuracy, and a new and simple averaging scheme is proposed which not only attains optimal rates, but can also be easily computed on-the-fly.
Open Problem: Is Averaging Needed for Strongly Convex Stochastic Gradient Descent?
TLDR
The question is whether averaging is needed at all to get optimal rates of stochastic gradient descent, and the algorithm makes use of an oracle, which gives a random vector ^ whose expectation is a subgradient ofF (w).
Stochastic Algorithm with Optimal Convergence Rate for Strongly Convex Optimization Problems
TLDR
A weighted algorithm based on COMID is presented, to keep the sparsity imposed by the L1 regularization term, and a prove is provided to show that it achieves an O(1/T) convergence rate.
Stochastic Algorithm with Optimal Convergence Rate for Strongly Convex Optimization Problems
TLDR
A weighted algorithm based on COMID is presented, to keep the sparsity imposed by the L1 regularization term, and a prove is provided to show that it achieves an O(1/T) convergence rate.
Efficient Stochastic Gradient Descent for Strongly Convex Optimization
TLDR
An epoch-projection SGD method that only makes a constant number of projections less than $\log_2T$ but achieves an optimal convergence rate for strongly convex optimization and a proximal extension to utilize the structure of the objective function that could further speed up the computation and convergence for sparse regularized loss minimization problems.
2019 2 Problem Setup and Main Results Consider the following optimization problem
TLDR
This work designs a modification scheme, that converts one sequence of step sizes to another so that the last point of SGD/GD with modified sequence has the same suboptimality guarantees as the average ofSGD/ GD with original sequence, and shows that this result holds with high-probability.
Stochastic Learning via Optimizing the Variational Inequalities
TLDR
The proposed stochastic ADMM (SADMM) is proved to have an O(1/t) VI-convergence rate for the l1-regularized hinge loss problems without strong convexity and smoothness and a new VI-criterion is defined to measure the convergence of Stochastic algorithms.
Tight Analyses for Non-Smooth Stochastic Gradient Descent
TLDR
It is proved that after $T$ steps of stochastic gradient descent, the error of the final iterate is $O(\log(T)/T)$ with high probability, and there exists a function from this class for which the errors of the last iterate of deterministic gradient descent is $\Omega(\log (T)/\sqrt{T})$.
Understanding the role of averaging in non-smooth stochastic gradient descent
TLDR
It is proved that after T steps of stochastic gradient descent (SGD), the error of the final iterate of deterministic gradient descent is O(log(T )/T ) with high probability, and there exists a function for which this happens, and the results are proven using a generalization of Freedman’s Inequality.
Optimal Stochastic Strongly Convex Optimization with a Logarithmic Number of Projections
TLDR
This work considers stochastic strongly convex optimization with a complex inequality constraint, and proposes an Epoch-Projection Stochastic Gradient Descent~(Epro-SGD) method, namely Epro-ORDA, based on the optimal regularized dual averaging method.
...
...

References

SHOWING 1-10 OF 17 REFERENCES
Stochastic Convex Optimization
TLDR
Stochastic convex optimization is studied, and it is shown that the key ingredient is strong convexity and regularization, which is only a sufficient, but not necessary, condition for meaningful non-trivial learnability.
Robust Stochastic Approximation Approach to Stochastic Programming
TLDR
It is intended to demonstrate that a properly modified SA approach can be competitive and even significantly outperform the SAA method for a certain class of convex stochastic problems.
Non-Asymptotic Analysis of Stochastic Approximation Algorithms for Machine Learning
TLDR
This work provides a non-asymptotic analysis of the convergence of two well-known algorithms, stochastic gradient descent as well as a simple modification where iterates are averaged, suggesting that a learning rate proportional to the inverse of the number of iterations, while leading to the optimal convergence rate, is not robust to the lack of strong convexity or the setting of the proportionality constant.
Logarithmic regret algorithms for online convex optimization
TLDR
Several algorithms achieving logarithmic regret are proposed, which besides being more general are also much more efficient to implement, and give rise to an efficient algorithm based on the Newton method for optimization, a new tool in the field.
Pegasos: primal estimated sub-gradient solver for SVM
TLDR
A simple and effective stochastic sub-gradient descent algorithm for solving the optimization problem cast by Support Vector Machines, which is particularly well suited for large text classification problems, and demonstrates an order-of-magnitude speedup over previous SVM learning methods.
Primal-dual subgradient methods for minimizing uniformly convex functions
TLDR
Accuracy bounds for the performance of non-Euclidean deterministic and stochastic algorithms and design methods which are adaptive with respect to the parameters of strong or uniform convexity of the objective are provided.
High-Probability Regret Bounds for Bandit Online Linear Optimization
TLDR
This paper eliminates the gap between the high-probability bounds obtained in the full-information vs bandit settings, and improves on the previous algorithm [8] whose regret is bounded in expectation against an oblivious adversary.
A General Class of Exponential Inequalities for Martingales and Ratios
In this paper we introduce a technique for obtaining exponential inequalities, with particular emphasis placed on results involving ratios. Our main applications consist of approximations to the tail
Beyond the regret minimization barrier: an optimal algorithm for stochastic strongly-convex optimization
TLDR
An algorithm which performs only gradient updates with optimal rate of convergence is given, which is equivalent to stochastic convex optimization with a strongly convex objective.
Stochastic Approximation and Recursive Algorithms and Applications
Introduction 1 Review of Continuous Time Models 1.1 Martingales and Martingale Inequalities 1.2 Stochastic Integration 1.3 Stochastic Differential Equations: Diffusions 1.4 Reflected Diffusions 1.5
...
...