Interlocking Backpropagation: Improving depthwise model-parallelism
@article{Gomez2020InterlockingBI, title={Interlocking Backpropagation: Improving depthwise model-parallelism}, author={Aidan N. Gomez and Oscar Key and Stephen Gou and Nick Frosst and Jeff Dean and Yarin Gal}, journal={ArXiv}, year={2020}, volume={abs/2010.04116} }
The number of parameters in state of the art neural networks has drastically increased in recent years. This surge of interest in large scale neural networks has motivated the development of new distributed training strategies enabling such models. One such strategy is model-parallel distributed training. Unfortunately, model-parallelism suffers from poor resource utilisation, which leads to wasted resources. In this work, we improve upon recent developments in an idealised model-parallel… CONTINUE READING
Figures and Tables from this paper
References
SHOWING 1-10 OF 10 REFERENCES
GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism
- Computer Science
- NeurIPS
- 2019
- 327
- PDF
Putting An End to End-to-End: Gradient-Isolated Learning of Representations
- Computer Science, Mathematics
- NeurIPS
- 2019
- 32
- PDF
Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent
- Computer Science, Mathematics
- NIPS
- 2011
- 1,789
- Highly Influential
- PDF
Going deeper with convolutions
- Computer Science
- 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- 2015
- 22,292
- PDF
Deep Residual Learning for Image Recognition
- Computer Science
- 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- 2016
- 59,908
- Highly Influential
- PDF
One billion word benchmark for measuring progress in statistical language modeling
- Computer Science
- INTERSPEECH
- 2014
- 735
- PDF
The "wake-sleep" algorithm for unsupervised neural networks.
- Computer Science, Medicine
- Science
- 1995
- 911
- PDF