Revisiting Stochastic Extragradient
@inproceedings{Mishchenko2020RevisitingSE, title={Revisiting Stochastic Extragradient}, author={Konstantin Mishchenko and D. Kovalev and Egor Shulgin and Peter Richt{\'a}rik and Yura Malitsky}, booktitle={AISTATS}, year={2020} }
We fix a fundamental issue in the stochastic extragradient method by providing a new sampling strategy that is motivated by approximating implicit updates. Since the existing stochastic extragradient algorithm, called Mirror-Prox, of (Juditsky et al., 2011) diverges on a simple bilinear problem when the domain is not bounded, we prove guarantees for solving variational inequality that go beyond existing settings. Furthermore, we illustrate numerically that the proposed variant converges faster…
Figures and Tables from this paper
30 Citations
Training Generative Adversarial Networks via Stochastic Nash Games.
- Computer Science, MathematicsIEEE transactions on neural networks and learning systems
- 2021
A stochastic relaxed forward-backward algorithm for GANs is proposed and it is shown convergence to an exact solution or to a neighbourhood of it, if the pseudogradient mapping of the game is monotone, and applies to the image generation problem where it observes computational advantages with respect to the extragradient scheme.
Clipped Stochastic Methods for Variational Inequalities with Heavy-Tailed Noise
- Computer Science
- 2022
This work proves the first high-probability complexity results with logarithmic dependence on the confidence level for stochastic methods for solving monotone and structured non-monotone VIPs with non-sub-Gaussian (heavy-tailed) noise and unbounded domains.
Training Generative Adversarial Networks with Adaptive Composite Gradient
- Computer ScienceArXiv
- 2021
The adaptive Composite Gradients (ACG) method is proposed, linearly convergent in bilinear games under suitable settings and is a novel semi-gradient-free algorithm since it does not need to calculate the gradient of each step, reducing the computational cost of gradient and Hessian by utilizing the predictive information in future iterations.
Training GANs with predictive projection centripetal acceleration
- Computer Science
- 2020
This work proposes a novel predictive projection centripetal acceleration (PPCA) methods to alleviate the cyclic behaviors of generative adversarial networks.
Generative Adversarial Networks as stochastic Nash games
- Computer ScienceArXiv
- 2020
A stochastic relaxed forward-backward algorithm for GANs is proposed and it is shown convergence to an exact solution or to a neighbourhood of it, if the pseudogradient mapping of the game is monotone, and applies to the image generation problem where it observes computational advantages with respect to the extragradient scheme.
Stochastic Extragradient: General Analysis and Improved Rates
- Computer ScienceAISTATS
- 2022
A novel theoretical framework is developed that allows us to analyze several variants of SEG in a unified manner and outperform the current state-of-the-art convergence guarantees and rely on less restrictive assumptions.
Adversarial Estimation of Riesz Representers
- Computer Science, MathematicsArXiv
- 2021
This work provides an adversarial approach to estimating Riesz representers of linear functionals within arbitrary function spaces with a plethora of recently introduced machine learning techniques and proves oracle inequalities based on the localized Rademacher complexity of the function space used to approximate the RiesZ representer and the approximation error.
On the Convergence of Stochastic Extragradient for Bilinear Games with Restarted Iteration Averaging
- GeologyAISTATS
- 2022
The stochastic bilinear minimax optimization problem is studied, an analysis of the same-sample Stochastic ExtraGradient method with constant step size is presented, and variations of the method that yield favorable convergence are presented.
Stochastic Gradient Descent-Ascent and Consensus Optimization for Smooth Games: Convergence Analysis under Expected Co-coercivity
- Computer Science, MathematicsNeurIPS
- 2021
The expected co-coercivity condition is introduced, its benefits are explained, and the first last-iterate convergence guarantees of SGDA and SCO under this condition are provided for solving a class of stochastic variational inequality problems that are potentially non-monotone.
Extrapolation for Large-batch Training in Deep Learning
- Computer ScienceICML
- 2020
This work proposes to use computationally efficient extrapolation (extragradient) to stabilize the optimization trajectory while still benefiting from smoothing to avoid sharp minima, and proves the convergence of this novel scheme and rigorously evaluates its empirical performance on ResNet, LSTM, and Transformer.
References
SHOWING 1-10 OF 33 REFERENCES
Reducing Noise in GAN Training with Variance Reduced Extragradient
- Computer ScienceNeurIPS
- 2019
A novel stochastic variance-reduced extragradient optimization algorithm, which for a large class of games improves upon the previous convergence rates proposed in the literature.
Training GANs with Optimism
- Computer ScienceICLR
- 2018
This work addresses the issue of limit cycling behavior in training Generative Adversarial Networks and proposes the use of Optimistic Mirror Decent (OMD) for training Wasserstein GANs and introduces a new algorithm, Optimistic Adam, which is an optimistic variant of Adam.
Negative Momentum for Improved Game Dynamics
- Computer ScienceAISTATS
- 2019
It is proved that alternating updates are more stable than simultaneous updates and both theoretically and empirically that alternating gradient updates with a negative momentum term achieves convergence in a difficult toy adversarial problem, but also on the notoriously difficult to train saturating GANs.
Wasserstein Generative Adversarial Networks
- Computer ScienceICML
- 2017
This work introduces a new algorithm named WGAN, an alternative to traditional GAN training that can improve the stability of learning, get rid of problems like mode collapse, and provide meaningful learning curves useful for debugging and hyperparameter searches.
Solving variational inequalities with Stochastic Mirror-Prox algorithm
- Mathematics, Computer Science
- 2008
A novel Stochastic Mirror-Prox algorithm is developed for solving s.v.i. variational inequalities with monotone operators and it is shown that with the convenient stepsize strategy it attains the optimal rates of convergence with respect to the problem parameters.
Self-Attention Generative Adversarial Networks
- Computer ScienceICML
- 2019
The proposed SAGAN achieves the state-of-the-art results, boosting the best published Inception score from 36.8 to 52.52 and reducing Frechet Inception distance from 27.62 to 18.65 on the challenging ImageNet dataset.
Unrolled Generative Adversarial Networks
- Computer ScienceICLR
- 2017
This work introduces a method to stabilize Generative Adversarial Networks by defining the generator objective with respect to an unrolled optimization of the discriminator, and shows how this technique solves the common problem of mode collapse, stabilizes training of GANs with complex recurrent generators, and increases diversity and coverage of the data distribution by the generator.
Last-Iterate Convergence: Zero-Sum Games and Constrained Min-Max Optimization
- Computer ScienceITCS
- 2019
It is shown that OMWU monotonically improves the Kullback-Leibler divergence of the current iterate to the (appropriately normalized) min-max solution until it enters a neighborhood of the solution and becomes a contracting map converging to the exact solution.
Improved Techniques for Training GANs
- Computer ScienceNIPS
- 2016
This work focuses on two applications of GANs: semi-supervised learning, and the generation of images that humans find visually realistic, and presents ImageNet samples with unprecedented resolution and shows that the methods enable the model to learn recognizable features of ImageNet classes.
Dual extrapolation and its applications to solving variational inequalities and related problems
- Mathematics, Computer ScienceMath. Program.
- 2007
This paper shows that with an appropriate step-size strategy, their method is optimal both for Lipschitz continuous operators and for the operators with bounded variations.