# PMGT-VR: A decentralized proximal-gradient algorithmic framework with variance reduction

@article{Ye2020PMGTVRAD, title={PMGT-VR: A decentralized proximal-gradient algorithmic framework with variance reduction}, author={Haishan Ye and Wei Xiong and Tong Zhang}, journal={ArXiv}, year={2020}, volume={abs/2012.15010} }

This paper considers the decentralized composite optimization problem. We propose a novel decentralized variance reduction proximal-gradient algorithmic framework, called PMGT-VR, which is based on a combination of several techniques including multi-consensus, gradient tracking, and variance reduction. The proposed framework relies on an imitation of centralized algorithms and we demonstrate that algorithms under this framework achieve convergence rates similar to that of their centralized…

## 4 Citations

Graph topology invariant gradient and sampling complexity for decentralized and stochastic optimization

- Computer Science
- 2021

New algorithms whose gradient and sampling complexities are graph topology invariant, while their communication complexities remain optimal, are proposed and are independent of the network structure.

Tackling Data Heterogeneity: A New Unified Framework for Decentralized SGD with Sample-induced Topology

- Computer ScienceICML
- 2022

A general framework unifying several gradient-based stochastic optimization methods for empirical risk minimization problems both in centralized and distributed scenarios is developed, and the obtained rate can recover the best known results for many existing algorithms.

Decentralized Stochastic Variance Reduced Extragradient Method

- Computer ScienceArXiv
- 2022

A novel decentralized optimization algorithm, called multi-consensus stochastic variance reduced extragradient, is proposed, which achieves the best known stoChastic first-order oracle (SFO) complexity for this problem.

Decentralized Stochastic Proximal Gradient Descent with Variance Reduction over Time-varying Networks

- Computer ScienceArXiv
- 2021

This paper transforms the decentralized algorithm into a centralized inexact proximal gradient algorithm with variance reduction, and proves that DPSVRG converges at the rate of O(1/T ) for general convex objectives plus a non-smooth term with T as the number of iterations, while DSPG convergence rate is retarded by the variance of stochastic gradients.

## References

SHOWING 1-10 OF 45 REFERENCES

Variance-Reduced Decentralized Stochastic Optimization With Accelerated Convergence

- Computer ScienceIEEE Transactions on Signal Processing
- 2020

A novel algorithmic framework to minimize a finite-sum of functions available over a network of nodes that is stochastic and decentralized, and thus is particularly suitable for problems where large-scale, potentially private data cannot be collected or processed at a centralized server.

A Decentralized Proximal-Gradient Method With Network Independent Step-Sizes and Separated Convergence Rates

- Computer ScienceIEEE Transactions on Signal Processing
- 2019

This paper proposes a novel proximal-gradient algorithm for a decentralized optimization problem with a composite objective containing smooth and nonsmooth terms that is as good as one of the two convergence rates that match the typical rates for the general gradient descent and the consensus averaging.

A Proximal Gradient Algorithm for Decentralized Composite Optimization

- Computer Science, MathematicsIEEE Transactions on Signal Processing
- 2015

A proximal gradient exact first-order algorithm (PG-EXTRA) that utilizes the composite structure and has the best known convergence rate and is a nontrivial extension to the recent algorithm EXTRA.

Multi-consensus Decentralized Accelerated Gradient Descent

- Computer ScienceArXiv
- 2020

A novel algorithm is proposed that can achieve near optimal communication complexity, matching the known lower bound up to a logarithmic factor of the condition number of the problem.

Decentralized Proximal Gradient Algorithms With Linear Convergence Rates

- Mathematics, Computer ScienceIEEE Transactions on Automatic Control
- 2021

A general primal-dual algorithmic framework that unifies many existing state-of-the-art algorithms is proposed that establishes linear convergence of the proposed method to the exact minimizer in the presence of the nonsmooth term.

A Linearly Convergent Proximal Gradient Algorithm for Decentralized Optimization

- Computer Science, MathematicsNeurIPS
- 2019

This work designs a proximal gradient decentralized algorithm whose fixed point coincides with the desired minimizer and provides a concise proof that establishes its linear convergence.

Can Decentralized Algorithms Outperform Centralized Algorithms? A Case Study for Decentralized Parallel Stochastic Gradient Descent

- Computer ScienceNIPS
- 2017

This paper studies a D-PSGD algorithm and provides the first theoretical analysis that indicates a regime in which decentralized algorithms might outperform centralized algorithms for distributed stochastic gradient descent.

DSA: Decentralized Double Stochastic Averaging Gradient Algorithm

- Computer ScienceJ. Mach. Learn. Res.
- 2016

The decentralized double stochastic averaging gradient (DSA) algorithm is proposed as a solution alternative that relies on strong convexity of local functions and Lipschitz continuity of local gradients to guarantee linear convergence of the sequence generated by DSA in expectation.

Exact Diffusion for Distributed Optimization and Learning—Part I: Algorithm Development

- MathematicsIEEE Transactions on Signal Processing
- 2019

The exact diffusion method is applicable to locally balanced left-stochastic combination matrices which, compared to the conventional doubly stochastic matrix, are more general and able to endow the algorithm with faster convergence rates, more flexible step-size choices, and improved privacy-preserving properties.

Prox-PDA: The Proximal Primal-Dual Algorithm for Fast Distributed Nonconvex Optimization and Learning Over Networks

- Computer ScienceICML
- 2017

A Proximal Primal-Dual Algorithm (Prox-PDA), which enables the network nodes to distributedly and collectively compute the set of first-order stationary solutions in a global sublinear manner in a rate of O(1/r), where r is the iteration counter.