• Corpus ID: 235436066

Decentralized Local Stochastic Extra-Gradient for Variational Inequalities

  title={Decentralized Local Stochastic Extra-Gradient for Variational Inequalities},
  author={Aleksandr Beznosikov and Pavel E. Dvurechensky and Anastasia Koloskova and Valentin Samokhin and Sebastian U. Stich and Alexander V. Gasnikov},
We consider distributed stochastic variational inequalities (VIs) on unbounded domain with the problem data being heterogeneous (non-IID) and distributed across many devices. We make very general assumption on the computational network that, in particular, covers the settings of fully decentralized calculations with time-varying networks and centralized topologies commonly used in Federated Learning. Moreover, multiple local updates on the workers can be made for reducing the communication… 

Figures and Tables from this paper

A Faster Decentralized Algorithm for Nonconvex Minimax Problems
This paper proposes a new faster decentralized algorithm, named as DMHSGD, for nonconvex minimax problems by using the variance reduced technique of hybrid stochastic gradient descent, and proves that this algorithm achieves linear speedup with respect to the number of workers.
Extragradient Method: O(1/K) Last-Iterate Convergence for Monotone Variational Inequalities and Connections With Cocoercivity
The first lastiterate O(1/K) convergence rate for EG for monotone and Lipschitz VIP without any additional assumptions on the operator is derived and given in terms of reducing the squared norm of the operator.
Federated Minimax Optimization: Improved Convergence Analyses and Algorithms
This paper analyzes Local stochastic gradient descent ascent (SGDA), the local- update version of the SGDA algorithm, and proposes a momentum-based local-update algorithm, which has the same convergence guarantees, but outperforms Local SGDA as demonstrated in the authors' experiments.
Distributed Saddle-Point Problems Under Similarity
This work studies solution methods for (strongly-)convex-(strongly)-concave Saddle-Point Problems (SPPs) over networks of two type–master/workers architectures and mesh networks, and proposes algorithms matching the lower bounds over either types of networks (up to log-factors).
A stochastic optimization algorithm based on kernel methods that has a total complexity better than the Stochastic Approximation approach combined with the Sinkhorn algorithm in many cases is proposed.
Optimal Decentralized Algorithms for Saddle Point Problems over Time-Varying Networks∗
This work study saddle point problems of sum type, where the summands are held by separate computational entities connected by a network, and obtains lower complexity bounds for algorithms in this setup and develops optimal methods which meet the lower bounds.
Near-Optimal Decentralized Algorithms for Saddle Point Problems over Time-Varying Networks
This work study saddle point problems of sum type, where the summands are held by separate computational entities connected by a network, and obtains lower complexity bounds for algorithms in this setup and develops near-optimal methods which meet the lower bounds.
Recent theoretical advances in decentralized distributed convex optimization.
This paper focuses on how the results of decentralized distributed convex optimization can be explained based on optimal algorithms for the non-distributed setup, and provides recent results that have not been published yet.
Federated Learning in Edge Computing: A Systematic Survey
A systematic survey of the literature on the implementation of FL in EC environments with a taxonomy to identify advanced solutions and other open problems is provided to help researchers better understand the connection between FL and EC enabling technologies and concepts.


Local Stochastic Gradient Descent Ascent: Convergence Analysis and Communication Efficiency
It is shown that local SGDA can provably optimize distributed minimax problems in both homogeneous and heterogeneous data with reduced number of communications and establish convergence rates under strongly-convex-strongly-concave and nonconveX- Stronger-Concave settings.
Revisiting Stochastic Extragradient
This work fixes a fundamental issue in the stochastic extragradient method by providing a new sampling strategy that is motivated by approximating implicit updates, and proves guarantees for solving variational inequality that go beyond existing settings.
A Unified Theory of Decentralized SGD with Changing Topology and Local Updates
This paper introduces a unified convergence analysis that covers a large variety of decentralized SGD methods which so far have required different intuitions, have different applications, and which have been developed separately in various communities.
Decentralized Parallel Algorithm for Training Generative Adversarial Nets
This paper designs the first gradient-based decentralized parallel algorithm which allows workers to have multiple rounds of communications in one iteration and to update the discriminator and generator simultaneously, and this design makes it amenable for the convergence analysis of the proposed decentralized algorithm.
Efficient Methods for Structured Nonconvex-Nonconcave Min-Max Optimization
A new class of structured nonconvex-nonconcave min-max optimization problems are introduced, proposing a generalization of the extragradient algorithm which provably converges to a stationary point and its iteration complexity and sample complexity bounds either match or improve the best known bounds.
Quasi-Global Momentum: Accelerating Decentralized Deep Learning on Heterogeneous Data
This paper investigates and identifies the limitation of several decentralized optimization algorithms for different degrees of data heterogeneity, and proposes a novel momentum-based method to mitigate this decentralized training difficulty.
Consensus Control for Decentralized Deep Learning
It is shown in theory that when the training consensus distance is lower than a critical quantity, decentralized training converges as fast as the centralized counterpart, and empirical insights allow the principled design of better decentralized training schemes that mitigate the performance drop.
Local SGD Converges Fast and Communicates Little
It is proved concise convergence rates for local SGD on convex problems and show that it converges at the same rate as mini-batch SGD in terms of number of evaluated gradients, that is, the scheme achieves linear speedup in the number of workers andmini-batch size.
Stochastic Gradient Push for Distributed Deep Learning
Stochastic Gradient Push is studied, it is proved that SGP converges to a stationary point of smooth, non-convex objectives at the same sub-linear rate as SGD, and that all nodes achieve consensus.
An adaptive Mirror-Prox method for variational inequalities with singular operators
A novel smoothness condition is proposed which relates the variation of the operator to that of a suitably chosen Bregman function, and which derives an adaptive mirror prox algorithm which attains an O(1/T) rate of convergence in problems with possibly singular operators.