MG-WFBP: Efficient Data Communication for Distributed Synchronous SGD Algorithms

@article{Shi2019MGWFBPED,
  title={MG-WFBP: Efficient Data Communication for Distributed Synchronous SGD Algorithms},
  author={Shaohuai Shi and Xiaowen Chu},
  journal={IEEE INFOCOM 2019 - IEEE Conference on Computer Communications},
  year={2019},
  pages={172-180}
}
  • Shaohuai Shi, Xiaowen Chu
  • Published 2019
  • Computer Science
  • IEEE INFOCOM 2019 - IEEE Conference on Computer Communications
  • Distributed synchronous stochastic gradient descent has been widely used to train deep neural networks on computer clusters. With the increase of computational power, network communications have become one limiting factor on the system scalability. In this paper, we observe that many deep neural networks have a large number of layers with only a small amount of data to be communicated. Based on the fact that merging some short communication tasks into a single one may reduce the overall… CONTINUE READING
    26 Citations
    MG-WFBP: Merging Gradients Wisely for Efficient Communication in Distributed Deep Learning
    • PDF
    A Distributed Synchronous SGD Algorithm with Global Top-k Sparsification for Low Bandwidth Networks
    • 35
    • PDF
    Communication-Efficient Distributed Deep Learning with Merged Gradient Sparsification on GPUs
    • 8
    • PDF
    Communication-Efficient Distributed Deep Learning: A Comprehensive Survey
    • 11
    • PDF
    Communication optimization strategies for distributed deep neural network training: A survey
    • 1
    • PDF
    A Quantitative Survey of Communication Optimizations in Distributed Deep Learning.
    • 2
    • PDF
    Layer-wise Adaptive Gradient Sparsification for Distributed Deep Learning with Convergence Guarantees
    • 4
    • PDF
    Communication Optimization Strategies for Distributed Deep Learning: A Survey
    • 5
    Preemptive All-reduce Scheduling for Expediting Distributed DNN Training
    • 6
    • Highly Influenced
    • PDF
    Towards Scalable Distributed Training of Deep Learning on Public Cloud Clusters
    • PDF

    References

    SHOWING 1-10 OF 31 REFERENCES
    A DAG Model of Synchronous Stochastic Gradient Descent in Distributed Deep Learning
    • 9
    • PDF
    A Distributed Synchronous SGD Algorithm with Global Top-k Sparsification for Low Bandwidth Networks
    • 35
    • PDF
    Poseidon: An Efficient Communication Architecture for Distributed Deep Learning on GPU Clusters
    • 153
    • Highly Influential
    • PDF
    Large Scale Distributed Deep Networks
    • 2,541
    • PDF
    TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning
    • 425
    • PDF
    GeePS: scalable deep learning on distributed GPUs with a GPU-specialized parameter server
    • 196
    • PDF
    Ako: Decentralised Deep Learning with Partial Gradient Exchange
    • 60
    • PDF
    Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training
    • 452
    • PDF
    Round-Robin Synchronization: Mitigating Communication Bottlenecks in Parameter Servers
    • C. Chen, W. Wang, B. Li
    • Computer Science
    • IEEE INFOCOM 2019 - IEEE Conference on Computer Communications
    • 2019
    • 21
    • PDF
    Highly Scalable Deep Learning Training System with Mixed-Precision: Training ImageNet in Four Minutes
    • 216
    • PDF