Corpus ID: 222208763

Interlocking Backpropagation: Improving depthwise model-parallelism

  title={Interlocking Backpropagation: Improving depthwise model-parallelism},
  author={Aidan N. Gomez and Oscar Key and Stephen Gou and Nick Frosst and Jeff Dean and Yarin Gal},
  • Aidan N. Gomez, Oscar Key, +3 authors Yarin Gal
  • Published 2020
  • Computer Science
  • ArXiv
  • The number of parameters in state of the art neural networks has drastically increased in recent years. This surge of interest in large scale neural networks has motivated the development of new distributed training strategies enabling such models. One such strategy is model-parallel distributed training. Unfortunately, model-parallelism suffers from poor resource utilisation, which leads to wasted resources. In this work, we improve upon recent developments in an idealised model-parallel… CONTINUE READING

    Figures and Tables from this paper


    Decoupled Greedy Learning of CNNs
    • 27
    • Highly Influential
    • PDF
    GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism
    • 327
    • PDF
    Putting An End to End-to-End: Gradient-Isolated Learning of Representations
    • 32
    • PDF
    Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent
    • 1,789
    • Highly Influential
    • PDF
    Going deeper with convolutions
    • 22,292
    • PDF
    Deep Residual Learning for Image Recognition
    • 59,908
    • Highly Influential
    • PDF
    Learning Multiple Layers of Features from Tiny Images
    • 10,453
    • PDF
    Language Models are Unsupervised Multitask Learners
    • 2,450
    • PDF
    One billion word benchmark for measuring progress in statistical language modeling
    • 735
    • PDF
    The "wake-sleep" algorithm for unsupervised neural networks.
    • 911
    • PDF