• Corpus ID: 219792827

# Stochastic Variance Reduction via Accelerated Dual Averaging for Finite-Sum Optimization

@article{Song2020StochasticVR,
title={Stochastic Variance Reduction via Accelerated Dual Averaging for Finite-Sum Optimization},
author={Chaobing Song and Yong Jiang and Yi Ma},
journal={ArXiv},
year={2020},
volume={abs/2006.10281}
}
• Published 18 June 2020
• Computer Science
• ArXiv
In this paper, we introduce a simplified and unified method for finite-sum convex optimization, named \emph{Stochastic Variance Reduction via Accelerated Dual Averaging (SVR-ADA)}. In the nonstrongly convex and smooth setting, SVR-ADA can attain an $O\big(\frac{1}{n}\big)$-accurate solution in $O(n\log\log n)$ number of stochastic gradient evaluations, where $n$ is the number of samples; meanwhile, SVR-ADA matches the lower bound of this setting up to a $\log\log n$ factor. In the strongly…

## Figures and Tables from this paper

• Computer Science
ICML
• 2022
The finite-sum convex optimization problem focusing on the general convex case is studied and two novel adaptive VR algorithms are proposed: Adaptive Variance Reduced Accelerated Extra-Gradient (AdaVRAE and AdaVRAG), which match the best-known convergence rate of non-adaptive VR methods.
Variance Reduction via Primal-Dual Accelerated Dual Averaging for Nonsmooth Convex Finite-Sums
• Mathematics, Computer Science
ICML
• 2021
A novel algorithm called Variance Reduction via Primal-Dual Accelerated Dual Averaging (VRPDA2) is proposed, which combines a simpler and more straightforward algorithm and analysis for general convex finite-sum optimization and reveals competitive performance of VRPda2 compared to state-of-the-art approaches.
RECAPP: Crafting a More Efficient Catalyst for Convex Optimization
• Computer Science
ICML
• 2022
This work proposes a novel Relaxed Error Criterion for Accelerated Proximal Point (RECAPP) that eliminates the need for high accuracy subproblem solutions and applies RECAPP to two canonical problems: finitesum and max-structured minimization.
For DRO problems with ambiguity sets defined by f-divergence [ Namkoong and Duchi
• Computer Science
• 2022
It is shown that Distributionally Robust Optimization problems with ambiguity sets based on both f -divergence and Wasserstein metrics can be reformulated as generalized linear programs by introducing sparsely connected auxiliary variables.
Practical Schemes for Finding Near-Stationary Points of Convex Finite-Sums
• Computer Science
AISTATS
• 2022
This work conducts a systematic study of algorithmic techniques for finding near-stationary points of convex finite-sums and proposes an adaptively regularized accelerated SVRG variant, which does not require the knowledge of some unknown initial constants and achieves near-optimal complexities.
• Computer Science
ICML
• 2022
This work analyzes the convergence of SRG in the strongly-convex case and shows that, while it does not recover the linear rate of control variates methods, it provably outperforms SGD.
Accelerated Convex Optimization with Stochastic Gradients: Generalizing the Strong-Growth Condition
• Computer Science
• 2021
The new condition for stochastic gradients not to slow down the convergence of Nesterov’s accelerated gradient method allows us to model problems with constraints and design new types of oracles (e.g., oracles for SAGA).
Coordinate Linear Variance Reduction for Generalized Linear Programming
• Computer Science
ArXiv
• 2021
It is shown that Distributionally Robust Optimization problems with ambiguity sets based on both f -divergence and Wasserstein metrics can be reformulated as generalized linear programs (GLPs) by introducing sparsely connected auxiliary variables.
Accelerating Perturbed Stochastic Iterates in Asynchronous Lock-Free Optimization
• Computer Science
ArXiv
• 2021
We show that stochastic acceleration can be achieved under the perturbed iterate framework (Mania et al., 2017) in asynchronous lock-free optimization, which leads to the optimal incremental gradient
• Computer Science
ArXiv
• 2021
It is proved that a variant of AdaSVRG requires Õ(n + 1/ ) gradient evaluations to achieve an O( )-suboptimality, matching the typical rate, but without needing to know problemdependent constants.

## References

SHOWING 1-10 OF 51 REFERENCES
A Simple Stochastic Variance Reduced Algorithm with Fast Convergence Rates
• Computer Science, Mathematics
ICML
• 2018
This paper introduces a simple stochastic variance reduced algorithm (MiG), which enjoys the best-known convergence rates for both strongly convex and non-strongly convex problems, and presents its efficient sparse and asynchronous variants, and theoretically analyze its convergence rates in these settings.
Breaking the Span Assumption Yields Fast Finite-Sum Minimization
• Computer Science
NeurIPS
• 2018
In this paper, we show that SVRG and SARAH can be modified to be fundamentally faster than all of the other standard algorithms that minimize the sum of $n$ smooth functions, such as SAGA, SAG, SDCA,
Tight Complexity Bounds for Optimizing Composite Objectives
• Computer Science, Mathematics
NIPS
• 2016
For smooth functions, it is shown that accelerated gradient descent and an accelerated variant of SVRG are optimal in the deterministic and randomized settings respectively, and that a gradient oracle is sufficient for the optimal rate.
Katyusha: the first direct acceleration of stochastic gradient methods
Katyusha momentum is introduced, a novel "negative momentum" on top of Nesterov's momentum that can be incorporated into a variance-reduction based algorithm and speed it up, and in each of such cases, one could potentially give Katyusha a hug.
Universal gradient methods for convex optimization problems
New methods for black-box convex minimization are presented, which demonstrate that the fast rate of convergence, typical for the smooth optimization problems, sometimes can be achieved even on nonsmooth problem instances.
A unified variance-reduced accelerated gradient method for convex optimization
• Computer Science, Mathematics
NeurIPS
• 2019
Varag is the first accelerated randomized incremental gradient method that benefits from the strong convexity of the data-fidelity term to achieve the optimal linear convergence.
A Stochastic Gradient Method with an Exponential Convergence Rate for Finite Training Sets
• Computer Science
NIPS
• 2012
A new stochastic gradient method for optimizing the sum of a finite set of smooth functions, where the sum is strongly convex, which incorporates a memory of previous gradient values in order to achieve a linear convergence rate.
A Simpler Approach to Accelerated Stochastic Optimization: Iterative Averaging Meets Optimism
• Computer Science
• 2020
This paper shows that there is a simpler approach to acceleration: applying optimistic online learning algorithms and querying the gradient oracle at the online average of the intermediate optimization iterates, and provides “universal” algorithms that achieve the optimal rate for smooth and non-smooth composite objectives simultaneously without further tuning.
On Tight Convergence Rates of Without-replacement SGD
• Computer Science
ArXiv
• 2020
This work analyzing step sizes that vary across epochs of without-replacement SGD shows that the rates hold after $\kappa^c\log(nK)$ epochs for some $c>0$.
Closing the convergence gap of SGD without replacement
• Computer Science, Mathematics
ICML
• 2020
It is shown that SGD without replacement achieves a rate of $\mathcal{O}\left(\frac{1}{T^2}+\frac{n^2}{ T^3}\right)$ when the sum of the functions is a quadratic, and a new lower bound is offered of $\Omega\left(frac{ n}{T ^2}\ right)$ for strongly convex functions that are sums of smooth functions.