• Corpus ID: 219792827

Stochastic Variance Reduction via Accelerated Dual Averaging for Finite-Sum Optimization

  title={Stochastic Variance Reduction via Accelerated Dual Averaging for Finite-Sum Optimization},
  author={Chaobing Song and Yong Jiang and Yi Ma},
In this paper, we introduce a simplified and unified method for finite-sum convex optimization, named \emph{Stochastic Variance Reduction via Accelerated Dual Averaging (SVR-ADA)}. In the nonstrongly convex and smooth setting, SVR-ADA can attain an $O\big(\frac{1}{n}\big)$-accurate solution in $O(n\log\log n)$ number of stochastic gradient evaluations, where $n$ is the number of samples; meanwhile, SVR-ADA matches the lower bound of this setting up to a $\log\log n$ factor. In the strongly… 

Figures and Tables from this paper

Adaptive Accelerated (Extra-)Gradient Methods with Variance Reduction
The finite-sum convex optimization problem focusing on the general convex case is studied and two novel adaptive VR algorithms are proposed: Adaptive Variance Reduced Accelerated Extra-Gradient (AdaVRAE and AdaVRAG), which match the best-known convergence rate of non-adaptive VR methods.
Variance Reduction via Primal-Dual Accelerated Dual Averaging for Nonsmooth Convex Finite-Sums
A novel algorithm called Variance Reduction via Primal-Dual Accelerated Dual Averaging (VRPDA2) is proposed, which combines a simpler and more straightforward algorithm and analysis for general convex finite-sum optimization and reveals competitive performance of VRPda2 compared to state-of-the-art approaches.
RECAPP: Crafting a More Efficient Catalyst for Convex Optimization
This work proposes a novel Relaxed Error Criterion for Accelerated Proximal Point (RECAPP) that eliminates the need for high accuracy subproblem solutions and applies RECAPP to two canonical problems: finitesum and max-structured minimization.
For DRO problems with ambiguity sets defined by f-divergence [ Namkoong and Duchi
It is shown that Distributionally Robust Optimization problems with ambiguity sets based on both f -divergence and Wasserstein metrics can be reformulated as generalized linear programs by introducing sparsely connected auxiliary variables.
Practical Schemes for Finding Near-Stationary Points of Convex Finite-Sums
This work conducts a systematic study of algorithmic techniques for finding near-stationary points of convex finite-sums and proposes an adaptively regularized accelerated SVRG variant, which does not require the knowledge of some unknown initial constants and achieves near-optimal complexities.
Stochastic Reweighted Gradient Descent
This work analyzes the convergence of SRG in the strongly-convex case and shows that, while it does not recover the linear rate of control variates methods, it provably outperforms SGD.
Accelerated Convex Optimization with Stochastic Gradients: Generalizing the Strong-Growth Condition
The new condition for stochastic gradients not to slow down the convergence of Nesterov’s accelerated gradient method allows us to model problems with constraints and design new types of oracles (e.g., oracles for SAGA).
Coordinate Linear Variance Reduction for Generalized Linear Programming
It is shown that Distributionally Robust Optimization problems with ambiguity sets based on both f -divergence and Wasserstein metrics can be reformulated as generalized linear programs (GLPs) by introducing sparsely connected auxiliary variables.
Accelerating Perturbed Stochastic Iterates in Asynchronous Lock-Free Optimization
We show that stochastic acceleration can be achieved under the perturbed iterate framework (Mania et al., 2017) in asynchronous lock-free optimization, which leads to the optimal incremental gradient
SVRG Meets AdaGrad: Painless Variance Reduction
It is proved that a variant of AdaSVRG requires Õ(n + 1/ ) gradient evaluations to achieve an O( )-suboptimality, matching the typical rate, but without needing to know problemdependent constants.


A Simple Stochastic Variance Reduced Algorithm with Fast Convergence Rates
This paper introduces a simple stochastic variance reduced algorithm (MiG), which enjoys the best-known convergence rates for both strongly convex and non-strongly convex problems, and presents its efficient sparse and asynchronous variants, and theoretically analyze its convergence rates in these settings.
Breaking the Span Assumption Yields Fast Finite-Sum Minimization
In this paper, we show that SVRG and SARAH can be modified to be fundamentally faster than all of the other standard algorithms that minimize the sum of $n$ smooth functions, such as SAGA, SAG, SDCA,
Tight Complexity Bounds for Optimizing Composite Objectives
For smooth functions, it is shown that accelerated gradient descent and an accelerated variant of SVRG are optimal in the deterministic and randomized settings respectively, and that a gradient oracle is sufficient for the optimal rate.
Katyusha: the first direct acceleration of stochastic gradient methods
Katyusha momentum is introduced, a novel "negative momentum" on top of Nesterov's momentum that can be incorporated into a variance-reduction based algorithm and speed it up, and in each of such cases, one could potentially give Katyusha a hug.
Universal gradient methods for convex optimization problems
New methods for black-box convex minimization are presented, which demonstrate that the fast rate of convergence, typical for the smooth optimization problems, sometimes can be achieved even on nonsmooth problem instances.
A unified variance-reduced accelerated gradient method for convex optimization
Varag is the first accelerated randomized incremental gradient method that benefits from the strong convexity of the data-fidelity term to achieve the optimal linear convergence.
A Stochastic Gradient Method with an Exponential Convergence Rate for Finite Training Sets
A new stochastic gradient method for optimizing the sum of a finite set of smooth functions, where the sum is strongly convex, which incorporates a memory of previous gradient values in order to achieve a linear convergence rate.
A Simpler Approach to Accelerated Stochastic Optimization: Iterative Averaging Meets Optimism
This paper shows that there is a simpler approach to acceleration: applying optimistic online learning algorithms and querying the gradient oracle at the online average of the intermediate optimization iterates, and provides “universal” algorithms that achieve the optimal rate for smooth and non-smooth composite objectives simultaneously without further tuning.
On Tight Convergence Rates of Without-replacement SGD
This work analyzing step sizes that vary across epochs of without-replacement SGD shows that the rates hold after $\kappa^c\log(nK)$ epochs for some $c>0$.
Closing the convergence gap of SGD without replacement
It is shown that SGD without replacement achieves a rate of $\mathcal{O}\left(\frac{1}{T^2}+\frac{n^2}{ T^3}\right)$ when the sum of the functions is a quadratic, and a new lower bound is offered of $\Omega\left(frac{ n}{T ^2}\ right)$ for strongly convex functions that are sums of smooth functions.