Topology-aware Generalization of Decentralized SGD

@inproceedings{Zhu2022TopologyawareGO,
  title={Topology-aware Generalization of Decentralized SGD},
  author={Tongtian Zhu and Fengxiang He and Lan Zhang and Zhengyang Niu and Mingli Song and Dacheng Tao},
  booktitle={ICML},
  year={2022}
}
This paper studies the algorithmic stability and generalizability of decentralized stochastic gradient descent (D-SGD). We prove that the consensus model learned by D-SGD is O ( m/N +1 /m + λ 2 ) -stable in expectation in the non-convex non-smooth setting, where N is the total sample size of the whole system, m is the worker number, and 1 − λ is the spectral gap that measures the connectivity of the communication topology. These results then deliver an O (1 /N +(( m − 1 λ 2 ) α 2 + m − α ) /N 1… 

Figures and Tables from this paper

DisPFL: Towards Communication-Efficient Personalized Federated Learning via Decentralized Sparse Training

TLDR
A novel personalized federated learning framework in a decentralized (peerto-peer) communication protocol named Dis-PFL, which employs personalized sparse masks to customize sparse local models on the edge and achieves higher model accuracy with less computation cost and communication rounds.

References

SHOWING 1-10 OF 86 REFERENCES

A Unified Theory of Decentralized SGD with Changing Topology and Local Updates

TLDR
This paper introduces a unified convergence analysis that covers a large variety of decentralized SGD methods which so far have required different intuitions, have different applications, and which have been developed separately in various communities.

Data-Dependent Stability of Stochastic Gradient Descent

TLDR
A data-dependent notion of algorithmic stability for Stochastic Gradient Descent is established, and novel generalization bounds are developed that exhibit fast convergence rates for SGD subject to a vanishing empirical risk and low noise of stochastic gradient.

A Unified and Refined Convergence Analysis for Non-Convex Decentralized Learning

TLDR
The results provide improved network topology dependent bounds for these methods (such as Exact-Diffusion/D<inline-formula><tex-math notation="LaTeX") and gradient-tracking compared with existing literature and show that these methods are often less sensitive to the networkTopology compared to <sc>Dsgd</sc>, which agrees with numerical experiments.

Yes, Topology Matters in Decentralized Optimization: Refined Convergence and Topology Learning under Heterogeneous Data

TLDR
This paper revisits the analysis of Decentralized Stochastic Gradient Descent algorithm (D-SGD), a popular decentralized learning algorithm, under data heterogeneity and argues that neighborhood heterogeneity provides a natural criterion to learn sparse data-dependent topologies that reduce (and can even eliminate) the otherwise detrimental impact of data heterogeneity on the convergence time of D- SGD.

Stability and Generalization of the Decentralized Stochastic Gradient Descent

TLDR
Leveraging this formulation together with (non)convex optimization theory, this paper establishes the first stability and generalization guarantees for the decentralized stochastic gradient descent.

Asynchronous Decentralized SGD with Quantized and Local Updates

TLDR
This paper implements and deploys the SwarmSGD algorithm, a variant of SGD that can outperform previous decentralized methods in terms of end-to-end training time, and that it can even rival carefully-tuned large-batch SGD for cer-tain tasks.

Exponential Graph is Provably Efficient for Decentralized Deep Training

TLDR
This work proves so-called exponential graphs where every node is connected to O(log(n)) neighbors and n is the total number of nodes can lead to both fast communication and effective averaging simultaneously, and discovers that a sequence of log(n) one-peer exponential graphs can together achieve exact averaging.

Graph-Dependent Implicit Regularisation for Distributed Stochastic Subgradient Descent

TLDR
This work proposes graph-dependent implicit regularisation strategies for distributed stochastic subgradient descent (Distributed SGD) for convex problems in multi-agent learning that avoid the need for explicit regularisation in decentralised learning problems, such as adding constraints to the empirical risk minimisation rule.

Stability of Stochastic Gradient Descent on Nonsmooth Convex Losses

TLDR
This work provides sharp upper and lower bounds for several forms of SGD and full-batch GD on arbitrary Lipschitz nonsmooth convex losses and obtains the first dimension-independent generalization bounds for multi-pass SGD in the nonssooth case.

Decentralized Stochastic Non-Convex Optimization over Weakly Connected Time-Varying Digraphs

  • Songtao LuC. Wu
  • Computer Science, Mathematics
    ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2020
TLDR
This work proposes a decentralized stochastic algorithm that is able to converge to the first-order stationary points of non-convex problems with provable convergence rates by leveraging the perturbed push sum protocol and gradient tracking techniques.
...