# Tail bounds for stochastic approximation

@article{Friedlander2013TailBF, title={Tail bounds for stochastic approximation}, author={Michael P. Friedlander and Gabriel Goh}, journal={arXiv: Optimization and Control}, year={2013} }

Stochastic-approximation gradient methods are attractive for large-scale convex optimization because they offer inexpensive iterations. They are especially popular in data-fitting and machine-learning applications where the data arrives in a continuous stream, or it is necessary to minimize large sums of functions. It is known that by appropriately decreasing the variance of the error at each iteration, the expected rate of convergence matches that of the underlying deterministic gradient…

## Figures from this paper

## 6 Citations

### A Proximal Stochastic Gradient Method with Progressive Variance Reduction

- Computer Science, MathematicsSIAM J. Optim.
- 2014

This work proposes and analyzes a new proximal stochastic gradient method, which uses a multistage scheme to progressively reduce the variance of the stochastics gradient.

### Stochastic Adaptive Quasi-Newton Methods for Minimizing Expected Values

- Computer ScienceICML
- 2017

A novel class of stochastic, adaptive methods for minimizing self-concordant functions which can be expressed as an expected value is proposed, which includes extensions of gradient descent and BFGS.

### Extragradient Method with Variance Reduction for Stochastic Variational Inequalities

- MathematicsSIAM J. Optim.
- 2017

We propose an extragradient method with stepsizes bounded away from zero for stochastic variational inequalities requiring only pseudomonotonicity. We provide convergence and complexity analysis, a...

### A Framework for Analyzing Stochastic Optimization Algorithms Under Dependence

- Computer Science
- 2020

This is the first work that analyzes a fully stochastic BFGS algorithm, which also avoids time consuming or even impossible line-search steps, and is proved that it converges linearly globally and super-linearly locally.

### Parallelizing sparse recovery algorithms: A stochastic approach

- Computer Science2014 19th International Conference on Digital Signal Processing
- 2014

This work proposes a novel technique for accelerating sparse recovery algorithms on multi-core shared memory architectures based on the principles of stochastic gradient descent that is as accurate as the sequential version but is significantly faster - the larger the size of the problem, the faster is the method.

### Accelerating low-rank matrix completion on GPUs

- Computer Science2014 International Conference on Advances in Computing, Communications and Informatics (ICACCI)
- 2014

This work modify and parallelize a well known matrix completion algorithm so that it can be implemented on a GPU and speed-up is significant and improves as the size of the dataset increases; there is no change in accuracy between the sequential and the proposed parallel implementation.

## References

SHOWING 1-10 OF 25 REFERENCES

### Approximation accuracy, gradient methods, and error bound for structured convex optimization

- Computer ScienceMath. Program.
- 2010

An error bound for the linear convergence analysis of first-order gradient methods for solving convex optimization problems arising in applications, possibly as approximations of intractable problems.

### Hybrid Deterministic-Stochastic Methods for Data Fitting

- Computer ScienceSIAM J. Sci. Comput.
- 2012

Rate-of-convergence analysis shows that by controlling the sample size in an incremental gradient algorithm, it is possible to maintain the steady convergence rates of full-gradient methods.

### Convergence Rates of Inexact Proximal-Gradient Methods for Convex Optimization

- Computer ScienceNIPS
- 2011

This work shows that both the basic proximal-gradient method and the accelerated proximal - gradient method achieve the same convergence rate as in the error-free case, provided that the errors decrease at appropriate rates.

### A Stochastic Approximation Method

- Mathematics
- 2007

Let M(x) denote the expected value at level x of the response to a certain experiment. M(x) is assumed to be a monotone function of x but is unknown tot he experiment, and it is desire to find the…

### Convergence Rate of Incremental Subgradient Algorithms

- Mathematics, Computer Science
- 2001

An incremental approach to minimizing a convex function that consists of the sum of a large number of component functions is considered, which has been very successful in solving large differentiable least squares problems, such as those arising in the training of neural networks.

### Robust Stochastic Approximation Approach to Stochastic Programming

- Computer Science, MathematicsSIAM J. Optim.
- 2009

It is intended to demonstrate that a properly modified SA approach can be competitive and even significantly outperform the SAA method for a certain class of convex stochastic problems.

### Sparse Online Learning via Truncated Gradient

- Computer ScienceNIPS
- 2008

This work proposes a general method called truncated gradient to induce sparsity in the weights of online-learning algorithms with convex loss and finds for datasets with large numbers of features, substantial sparsity is discoverable.

### Incremental Gradient Algorithms with Stepsizes Bounded Away from Zero

- Computer ScienceComput. Optim. Appl.
- 1998

The first convergence results of any kind for this computationally important case are derived and it is shown that a certain ε-approximate solution can be obtained and the linear dependence of ε on the stepsize limit is established.

### Gradient methods for minimizing composite objective function

- Computer Science, Mathematics
- 2007

In this paper we analyze several new methods for solving optimization problems with the objective function formed as a sum of two convex terms: one is smooth and given by a black-box oracle, and…

### A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems

- Computer Science, MathematicsSIAM J. Imaging Sci.
- 2009

A new fast iterative shrinkage-thresholding algorithm (FISTA) which preserves the computational simplicity of ISTA but with a global rate of convergence which is proven to be significantly better, both theoretically and practically.