# Stochastic Gradient Descent-Ascent and Consensus Optimization for Smooth Games: Convergence Analysis under Expected Co-coercivity

@inproceedings{Loizou2021StochasticGD, title={Stochastic Gradient Descent-Ascent and Consensus Optimization for Smooth Games: Convergence Analysis under Expected Co-coercivity}, author={Nicolas Loizou and Hugo Berard and Gauthier Gidel and Ioannis Mitliagkas and Simon Lacoste-Julien}, booktitle={Neural Information Processing Systems}, year={2021} }

Two of the most prominent algorithms for solving unconstrained smooth games are the classical stochastic gradient descent-ascent (SGDA) and the recently introduced stochastic consensus optimization (SCO) [Mescheder et al., 2017]. SGDA is known to converge to a stationary point for specific classes of games, but current convergence analyses require a bounded variance assumption. SCO is used successfully for solving large-scale adversarial problems, but its convergence guarantees are limited to…

## 15 Citations

### Tight Analysis of Extra-gradient and Optimistic Gradient Methods For Nonconvex Minimax Problems

- Computer ScienceArXiv
- 2022

These results will advance the theoretical understanding of OGDA and EG methods for solving complicated nonconvex minimax real-world problems, e.g., Generative Adversarial Networks (GANs) or robust neural networks training.

### Faster Single-loop Algorithms for Minimax Optimization without Strong Concavity

- Computer ScienceAISTATS
- 2022

New convergence results for two alternative single-loop algorithms – alternating GDA and smoothed GDA – under the mild assumption that the objective satisfies the PolyakLojasiewicz (PL) condition about one variable are established.

### Clipped Stochastic Methods for Variational Inequalities with Heavy-Tailed Noise

- Computer ScienceArXiv
- 2022

This work proves the first high-probability complexity results with logarithmic dependence on the confidence level for stochastic methods for solving monotone and structured non-monotone VIPs with non-sub-Gaussian (heavy-tailed) noise and unbounded domains.

### Stochastic Gradient Descent-Ascent: Unified Theory and New Efficient Methods

- Computer ScienceArXiv
- 2022

A unified convergence analysis that covers a large variety of stochastic gradient descent-ascent methods, which so far have required different intuitions, have different applications and have been developed separately in various communities is proposed.

### SGDA with shuffling: faster convergence for nonconvex-P{\L} minimax optimization

- Computer Science
- 2022

This work studies the convergence bounds of SGDA with random reshuﬄing ( SGDA- RR) for smooth nonconvex-nonconcave objectives with Polyak-Łojasiewicz (PŁ) geometry, and presents a comprehensive lower bound for two-time-scale GDA, which matches the full-batch rate for primal-Pł-P Ł case.

### ProxSkip for Stochastic Variational Inequalities: A Federated Learning Algorithm for Provable Communication Acceleration

- Computer Science
- 2022

ProxSkip-VIP algorithm is proposed, which generalizes the original ProxSkip framework to VIP, and it is explained how the approach achieves acceleration in terms of the communication complexity over existing state-of-the-art FL algorithms.

### Sampling without Replacement Leads to Faster Rates in Finite-Sum Minimax Optimization

- Computer ScienceArXiv
- 2022

The convergence rates of stochastic gradient algorithms for smooth and strongly convex-strongly concave optimization are analyzed and it is shown that, for many such algorithms, sampling the data points without replacement leads to faster convergence compared to sampling with replacement.

### Stochastic Extragradient: General Analysis and Improved Rates

- Computer ScienceAISTATS
- 2022

A novel theoretical framework is developed that allows us to analyze several variants of SEG in a unified manner and outperform the current state-of-the-art convergence guarantees and rely on less restrictive assumptions.

### Extragradient Method: O(1/K) Last-Iterate Convergence for Monotone Variational Inequalities and Connections With Cocoercivity

- MathematicsAISTATS
- 2022

The first lastiterate O(1/K) convergence rate for EG for monotone and Lipschitz VIP without any additional assumptions on the operator is derived and given in terms of reducing the squared norm of the operator.

### High Probability Generalization Bounds with Fast Rates for Minimax Problems

- Computer ScienceICLR
- 2022

This paper provides improved generalization analyses and obtain sharper high probability generalization bounds for most existing generalization measures of minimax problems, and uses the improved learning bounds to establish high probabilitygeneralization bounds with fast rates for classical empirical saddle point (ESP) solution and several popular gradient-based optimization algorithms.

## References

SHOWING 1-10 OF 84 REFERENCES

### Stochastic Hamiltonian Gradient Methods for Smooth Games

- Computer Science, MathematicsICML
- 2020

This work proposes a novel unbiased estimator for the stochastic Hamiltonian gradient descent (SHGD) and shows that SHGD converges linearly to the neighbourhood of a stationary point, and provides the first global non-asymptotic last-iterate convergence guarantees for certain classes of Stochastic smooth games.

### Last-iterate convergence rates for min-max optimization

- Computer ScienceALT
- 2021

It is shown that the Hamiltonian Gradient Descent (HGD) algorithm achieves linear convergence in a variety of more general settings, including convex-concave problems that satisfy a "sufficiently bilinear" condition.

### Stochastic Gradient Descent on Nonconvex Functions with General Noise Models

- Computer Science
- 2021

The scope of nonconvex problems and noise models to which SGD can be applied with rigorous guarantees of its global behavior is broadened, and the norm of the gradient function evaluated at SGD’s iterates converges to zero with probability one and in expectation.

### Optimistic Dual Extrapolation for Coherent Non-monotone Variational Inequalities

- Mathematics, Computer ScienceNeurIPS
- 2020

OptDE is proposed, a method that only performs one gradient evaluation per iteration that is provably convergent to a strong solution under different coherent non-monotone assumptions and provides the near-optimal O 1 log 1 ✏ convergence guarantee in terms of restricted strong merit function for monotone variational inequalities.

### SGD and Hogwild! Convergence Without the Bounded Gradients Assumption

- Computer ScienceICML
- 2018

It is shown that for stochastic problems arising in machine learning such bound always holds; and an alternative convergence analysis of SGD with diminishing learning rate regime is proposed, which results in more relaxed conditions than those in (Bottou et al.,2016).

### The Mechanics of n-Player Differentiable Games

- Computer ScienceICML
- 2018

The key result is to decompose the second-order dynamics into two components, related to potential games, which reduce to gradient descent on an implicit function; the second relates to Hamiltonian games, a new class of games that obey a conservation law, akin to conservation laws in classical mechanical systems.

### Efficient Methods for Structured Nonconvex-Nonconcave Min-Max Optimization

- Computer ScienceAISTATS
- 2021

A new class of structured nonconvex-nonconcave min-max optimization problems are introduced, proposing a generalization of the extragradient algorithm which provably converges to a stationary point and its iteration complexity and sample complexity bounds either match or improve the best known bounds.

### A Tight and Unified Analysis of Gradient-Based Methods for a Whole Spectrum of Differentiable Games

- Computer ScienceAISTATS
- 2020

A tight analysis of EG’s convergence rate in games shows that, unlike in convex minimization, EG may be much faster than GD, and it is proved that EG achieves the optimal rate for a wide class of algorithms with any number of extrapolations.

### Better Theory for SGD in the Nonconvex World

- Computer ScienceArXiv
- 2020

A new variant of the recently introduced expected smoothness assumption which governs the behaviour of the second moment of the stochastic gradient is proposed and it is shown that this assumption is both more general and more reasonable than assumptions made in all prior work.

### SGD for Structured Nonconvex Functions: Learning Rates, Minibatching and Interpolation

- Computer Science, MathematicsAISTATS
- 2021

The Expected Residual condition is shown to be a strictly weaker assumption as compared to previously used growth conditions, expected smoothness or bounded variance assumptions, and the best known convergence rates of full gradient descent and single element sampling SGD are recovered.