# Topology-aware Generalization of Decentralized SGD

@inproceedings{Zhu2022TopologyawareGO, title={Topology-aware Generalization of Decentralized SGD}, author={Tongtian Zhu and Fengxiang He and Lan Zhang and Zhengyang Niu and Mingli Song and Dacheng Tao}, booktitle={ICML}, year={2022} }

This paper studies the algorithmic stability and generalizability of decentralized stochastic gradient descent (D-SGD). We prove that the consensus model learned by D-SGD is O ( m/N +1 /m + λ 2 ) -stable in expectation in the non-convex non-smooth setting, where N is the total sample size of the whole system, m is the worker number, and 1 − λ is the spectral gap that measures the connectivity of the communication topology. These results then deliver an O (1 /N +(( m − 1 λ 2 ) α 2 + m − α ) /N 1…

## Figures and Tables from this paper

## One Citation

### DisPFL: Towards Communication-Efficient Personalized Federated Learning via Decentralized Sparse Training

- Computer ScienceICML
- 2022

A novel personalized federated learning framework in a decentralized (peerto-peer) communication protocol named Dis-PFL, which employs personalized sparse masks to customize sparse local models on the edge and achieves higher model accuracy with less computation cost and communication rounds.

## References

SHOWING 1-10 OF 86 REFERENCES

### A Unified Theory of Decentralized SGD with Changing Topology and Local Updates

- Computer ScienceICML
- 2020

This paper introduces a unified convergence analysis that covers a large variety of decentralized SGD methods which so far have required different intuitions, have different applications, and which have been developed separately in various communities.

### Data-Dependent Stability of Stochastic Gradient Descent

- Computer ScienceICML
- 2018

A data-dependent notion of algorithmic stability for Stochastic Gradient Descent is established, and novel generalization bounds are developed that exhibit fast convergence rates for SGD subject to a vanishing empirical risk and low noise of stochastic gradient.

### A Unified and Refined Convergence Analysis for Non-Convex Decentralized Learning

- Computer ScienceIEEE Transactions on Signal Processing
- 2022

The results provide improved network topology dependent bounds for these methods (such as Exact-Diffusion/D<inline-formula><tex-math notation="LaTeX") and gradient-tracking compared with existing literature and show that these methods are often less sensitive to the networkTopology compared to <sc>Dsgd</sc>, which agrees with numerical experiments.

### Yes, Topology Matters in Decentralized Optimization: Refined Convergence and Topology Learning under Heterogeneous Data

- Computer ScienceArXiv
- 2022

This paper revisits the analysis of Decentralized Stochastic Gradient Descent algorithm (D-SGD), a popular decentralized learning algorithm, under data heterogeneity and argues that neighborhood heterogeneity provides a natural criterion to learn sparse data-dependent topologies that reduce (and can even eliminate) the otherwise detrimental impact of data heterogeneity on the convergence time of D- SGD.

### Stability and Generalization of the Decentralized Stochastic Gradient Descent

- Computer ScienceAAAI
- 2021

Leveraging this formulation together with (non)convex optimization theory, this paper establishes the first stability and generalization guarantees for the decentralized stochastic gradient descent.

### Asynchronous Decentralized SGD with Quantized and Local Updates

- Computer ScienceNeurIPS
- 2021

This paper implements and deploys the SwarmSGD algorithm, a variant of SGD that can outperform previous decentralized methods in terms of end-to-end training time, and that it can even rival carefully-tuned large-batch SGD for cer-tain tasks.

### Exponential Graph is Provably Efficient for Decentralized Deep Training

- Computer ScienceNeurIPS
- 2021

This work proves so-called exponential graphs where every node is connected to O(log(n)) neighbors and n is the total number of nodes can lead to both fast communication and effective averaging simultaneously, and discovers that a sequence of log(n) one-peer exponential graphs can together achieve exact averaging.

### Graph-Dependent Implicit Regularisation for Distributed Stochastic Subgradient Descent

- Computer Science, MathematicsJ. Mach. Learn. Res.
- 2020

This work proposes graph-dependent implicit regularisation strategies for distributed stochastic subgradient descent (Distributed SGD) for convex problems in multi-agent learning that avoid the need for explicit regularisation in decentralised learning problems, such as adding constraints to the empirical risk minimisation rule.

### Stability of Stochastic Gradient Descent on Nonsmooth Convex Losses

- Computer ScienceNeurIPS
- 2020

This work provides sharp upper and lower bounds for several forms of SGD and full-batch GD on arbitrary Lipschitz nonsmooth convex losses and obtains the first dimension-independent generalization bounds for multi-pass SGD in the nonssooth case.

### Decentralized Stochastic Non-Convex Optimization over Weakly Connected Time-Varying Digraphs

- Computer Science, MathematicsICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2020

This work proposes a decentralized stochastic algorithm that is able to converge to the first-order stationary points of non-convex problems with provable convergence rates by leveraging the perturbed push sum protocol and gradient tracking techniques.