# Stochastic Variance Reduction via Accelerated Dual Averaging for Finite-Sum Optimization

@article{Song2020StochasticVR, title={Stochastic Variance Reduction via Accelerated Dual Averaging for Finite-Sum Optimization}, author={Chaobing Song and Yong Jiang and Yi Ma}, journal={ArXiv}, year={2020}, volume={abs/2006.10281} }

In this paper, we introduce a simplified and unified method for finite-sum convex optimization, named \emph{Stochastic Variance Reduction via Accelerated Dual Averaging (SVR-ADA)}. In the nonstrongly convex and smooth setting, SVR-ADA can attain an $O\big(\frac{1}{n}\big)$-accurate solution in $O(n\log\log n)$ number of stochastic gradient evaluations, where $n$ is the number of samples; meanwhile, SVR-ADA matches the lower bound of this setting up to a $\log\log n$ factor. In the strongly…

## 10 Citations

### Adaptive Accelerated (Extra-)Gradient Methods with Variance Reduction

- Computer ScienceICML
- 2022

The finite-sum convex optimization problem focusing on the general convex case is studied and two novel adaptive VR algorithms are proposed: Adaptive Variance Reduced Accelerated Extra-Gradient (AdaVRAE and AdaVRAG), which match the best-known convergence rate of non-adaptive VR methods.

### Variance Reduction via Primal-Dual Accelerated Dual Averaging for Nonsmooth Convex Finite-Sums

- Mathematics, Computer ScienceICML
- 2021

A novel algorithm called Variance Reduction via Primal-Dual Accelerated Dual Averaging (VRPDA2) is proposed, which combines a simpler and more straightforward algorithm and analysis for general convex finite-sum optimization and reveals competitive performance of VRPda2 compared to state-of-the-art approaches.

### RECAPP: Crafting a More Efficient Catalyst for Convex Optimization

- Computer ScienceICML
- 2022

This work proposes a novel Relaxed Error Criterion for Accelerated Proximal Point (RECAPP) that eliminates the need for high accuracy subproblem solutions and applies RECAPP to two canonical problems: finitesum and max-structured minimization.

### For DRO problems with ambiguity sets defined by f-divergence [ Namkoong and Duchi

- Computer Science
- 2022

It is shown that Distributionally Robust Optimization problems with ambiguity sets based on both f -divergence and Wasserstein metrics can be reformulated as generalized linear programs by introducing sparsely connected auxiliary variables.

### Practical Schemes for Finding Near-Stationary Points of Convex Finite-Sums

- Computer ScienceAISTATS
- 2022

This work conducts a systematic study of algorithmic techniques for finding near-stationary points of convex finite-sums and proposes an adaptively regularized accelerated SVRG variant, which does not require the knowledge of some unknown initial constants and achieves near-optimal complexities.

### Stochastic Reweighted Gradient Descent

- Computer ScienceICML
- 2022

Stochastic reweighted gradient descent (SRG) is proposed, a stochastic gradient method based solely on importance sampling that can reduce the variance of the gradient estimator and improve on the asymptotic error of stochastics gradient descent in the strongly convex and smooth case.

### Accelerated Convex Optimization with Stochastic Gradients: Generalizing the Strong-Growth Condition

- Computer Science
- 2021

The new condition for stochastic gradients not to slow down the convergence of Nesterov’s accelerated gradient method allows us to model problems with constraints and design new types of oracles (e.g., oracles for SAGA).

### Coordinate Linear Variance Reduction for Generalized Linear Programming

- Computer ScienceArXiv
- 2021

It is shown that Distributionally Robust Optimization problems with ambiguity sets based on both f -divergence and Wasserstein metrics can be reformulated as generalized linear programs (GLPs) by introducing sparsely connected auxiliary variables.

### Accelerating Perturbed Stochastic Iterates in Asynchronous Lock-Free Optimization

- Computer ScienceArXiv
- 2021

We show that stochastic acceleration can be achieved under the perturbed iterate framework (Mania et al., 2017) in asynchronous lock-free optimization, which leads to the optimal incremental gradient…

### SVRG Meets AdaGrad: Painless Variance Reduction

- Computer ScienceArXiv
- 2021

It is proved that a variant of AdaSVRG requires Õ(n + 1/ ) gradient evaluations to achieve an O( )-suboptimality, matching the typical rate, but without needing to know problemdependent constants.

## References

SHOWING 1-10 OF 51 REFERENCES

### A Simple Stochastic Variance Reduced Algorithm with Fast Convergence Rates

- Computer Science, MathematicsICML
- 2018

This paper introduces a simple stochastic variance reduced algorithm (MiG), which enjoys the best-known convergence rates for both strongly convex and non-strongly convex problems, and presents its efficient sparse and asynchronous variants, and theoretically analyze its convergence rates in these settings.

### Breaking the Span Assumption Yields Fast Finite-Sum Minimization

- Computer ScienceNeurIPS
- 2018

In this paper, we show that SVRG and SARAH can be modified to be fundamentally faster than all of the other standard algorithms that minimize the sum of $n$ smooth functions, such as SAGA, SAG, SDCA,…

### Tight Complexity Bounds for Optimizing Composite Objectives

- Computer Science, MathematicsNIPS
- 2016

For smooth functions, it is shown that accelerated gradient descent and an accelerated variant of SVRG are optimal in the deterministic and randomized settings respectively, and that a gradient oracle is sufficient for the optimal rate.

### Katyusha: the first direct acceleration of stochastic gradient methods

- Computer ScienceJ. Mach. Learn. Res.
- 2017

Katyusha momentum is introduced, a novel "negative momentum" on top of Nesterov's momentum that can be incorporated into a variance-reduction based algorithm and speed it up, and in each of such cases, one could potentially give Katyusha a hug.

### Universal gradient methods for convex optimization problems

- Computer ScienceMath. Program.
- 2015

New methods for black-box convex minimization are presented, which demonstrate that the fast rate of convergence, typical for the smooth optimization problems, sometimes can be achieved even on nonsmooth problem instances.

### A unified variance-reduced accelerated gradient method for convex optimization

- Computer Science, MathematicsNeurIPS
- 2019

Varag is the first accelerated randomized incremental gradient method that benefits from the strong convexity of the data-fidelity term to achieve the optimal linear convergence.

### A Stochastic Gradient Method with an Exponential Convergence Rate for Finite Training Sets

- Computer ScienceNIPS
- 2012

A new stochastic gradient method for optimizing the sum of a finite set of smooth functions, where the sum is strongly convex, which incorporates a memory of previous gradient values in order to achieve a linear convergence rate.

### A Simpler Approach to Accelerated Stochastic Optimization: Iterative Averaging Meets Optimism

- Computer Science
- 2020

This paper shows that there is a simpler approach to acceleration: applying optimistic online learning algorithms and querying the gradient oracle at the online average of the intermediate optimization iterates, and provides “universal” algorithms that achieve the optimal rate for smooth and non-smooth composite objectives simultaneously without further tuning.

### On Tight Convergence Rates of Without-replacement SGD

- Computer ScienceArXiv
- 2020

This work analyzing step sizes that vary across epochs of without-replacement SGD shows that the rates hold after $\kappa^c\log(nK)$ epochs for some $c>0$.

### Closing the convergence gap of SGD without replacement

- Computer Science, MathematicsICML
- 2020

It is shown that SGD without replacement achieves a rate of $\mathcal{O}\left(\frac{1}{T^2}+\frac{n^2}{ T^3}\right)$ when the sum of the functions is a quadratic, and a new lower bound is offered of $\Omega\left(frac{ n}{T ^2}\ right)$ for strongly convex functions that are sums of smooth functions.