Corpus ID: 51876267

Highly Scalable Deep Learning Training System with Mixed-Precision: Training ImageNet in Four Minutes

@article{Jia2018HighlySD,
  title={Highly Scalable Deep Learning Training System with Mixed-Precision: Training ImageNet in Four Minutes},
  author={Xianyan Jia and Shutao Song and Wei He and Yangzihao Wang and Haidong Rong and Feihu Zhou and Liqiang Xie and Zhenyu Guo and Yuanzhou Yang and Liwei Yu and Tiegang Chen and Guangxiao Hu and Shaohuai Shi and Xiaowen Chu},
  journal={ArXiv},
  year={2018},
  volume={abs/1807.11205}
}
  • Xianyan Jia, Shutao Song, +11 authors Xiaowen Chu
  • Published 2018
  • Computer Science, Mathematics
  • ArXiv
  • Synchronized stochastic gradient descent (SGD) optimizers with data parallelism are widely used in training large-scale deep neural networks. Although using larger mini-batch sizes can improve the system scalability by reducing the communication-to-computation ratio, it may hurt the generalization ability of the models. To this end, we build a highly scalable deep learning training system for dense GPU clusters with three main contributions: (1) We propose a mixed-precision training method that… CONTINUE READING

    Citations

    Publications citing this paper.
    SHOWING 1-10 OF 151 CITATIONS

    Scalable and Practical Natural Gradient for Large-Scale Deep Learning

    VIEW 3 EXCERPTS
    CITES BACKGROUND

    Improving Scalability of Parallel CNN Training by Adjusting Mini-Batch Size at Run-Time

    VIEW 2 EXCERPTS
    CITES BACKGROUND

    Large-batch training for LSTM and beyond

    VIEW 2 EXCERPTS
    CITES METHODS & BACKGROUND

    ImageNet/ResNet-50 Training in 224 Seconds

    VIEW 7 EXCERPTS
    CITES METHODS & BACKGROUND
    HIGHLY INFLUENCED

    Parallax: Sparsity-aware Data Parallel Training of Deep Neural Networks

    VIEW 2 EXCERPTS
    CITES METHODS & BACKGROUND

    Large Batch Training Does Not Need Warmup

    VIEW 1 EXCERPT
    CITES METHODS

    Large-Scale Distributed Second-Order Optimization Using Kronecker-Factored Approximate Curvature for Deep Convolutional Neural Networks

    VIEW 3 EXCERPTS
    CITES BACKGROUND & METHODS

    FILTER CITATIONS BY YEAR

    2016
    2020

    CITATION STATISTICS

    • 15 Highly Influenced Citations

    • Averaged 50 Citations per year from 2018 through 2020

    References

    Publications referenced by this paper.
    SHOWING 1-10 OF 35 REFERENCES

    Performance Modeling and Evaluation of Distributed Deep Learning Frameworks on GPUs

    • Shaohuai Shi, Xiaowen Chu
    • Computer Science
    • 2018 IEEE 16th Intl Conf on Dependable, Autonomic and Secure Computing, 16th Intl Conf on Pervasive Intelligence and Computing, 4th Intl Conf on Big Data Intelligence and Computing and Cyber Science and Technology Congress(DASC/PiCom/DataCom/CyberSciTech)
    • 2018
    VIEW 1 EXCERPT

    Poseidon: An Efficient Communication Architecture for Distributed Deep Learning on GPU Clusters

    VIEW 1 EXCERPT

    Mixed Precision Training

    VIEW 2 EXCERPTS