Corpus ID: 222208763

Interlocking Backpropagation: Improving depthwise model-parallelism

@article{Gomez2020InterlockingBI,
  title={Interlocking Backpropagation: Improving depthwise model-parallelism},
  author={Aidan N. Gomez and Oscar Key and Stephen Gou and Nick Frosst and J. Dean and Y. Gal},
  journal={ArXiv},
  year={2020},
  volume={abs/2010.04116}
}
The number of parameters in state of the art neural networks has drastically increased in recent years. This surge of interest in large scale neural networks has motivated the development of new distributed training strategies enabling such models. One such strategy is model-parallel distributed training. Unfortunately, model-parallelism suffers from poor resource utilisation, which leads to wasted resources. In this work, we improve upon recent developments in an idealised model-parallel… Expand

Figures and Tables from this paper

References

SHOWING 1-10 OF 10 REFERENCES
Decoupled Greedy Learning of CNNs
  • 27
  • Highly Influential
  • PDF
GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism
  • 349
  • PDF
Putting An End to End-to-End: Gradient-Isolated Learning of Representations
  • 39
  • PDF
Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent
  • 1,812
  • Highly Influential
  • PDF
Going deeper with convolutions
  • 22,852
  • PDF
Deep Residual Learning for Image Recognition
  • 62,549
  • Highly Influential
  • PDF
Learning Multiple Layers of Features from Tiny Images
  • 10,929
  • PDF
Language Models are Unsupervised Multitask Learners
  • 2,679
  • PDF
One billion word benchmark for measuring progress in statistical language modeling
  • 745
  • PDF
The "wake-sleep" algorithm for unsupervised neural networks.
  • 915
  • PDF