Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers
@article{Li2020TrainLT, title={Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers}, author={Zhuohan Li and Eric Wallace and Sheng Shen and Kevin Lin and K. Keutzer and D. Klein and J. Gonzalez}, journal={ArXiv}, year={2020}, volume={abs/2002.11794} }
Since hardware resources are limited, the objective of training deep learning models is typically to maximize accuracy subject to the time and memory constraints of training and inference. We study the impact of model size in this setting, focusing on Transformer models for NLP tasks that are limited by compute: self-supervised pretraining and high-resource machine translation. We first show that even though smaller Transformer models execute faster per iteration, wider and deeper models… CONTINUE READING
Figures and Topics from this paper
Paper Mentions
37 Citations
Length-Adaptive Transformer: Train Once with Length Drop, Use Anytime with Search
- Computer Science
- ArXiv
- 2020
- PDF
General Purpose Text Embeddings from Pre-trained Language Models for Scalable Inference
- Computer Science
- EMNLP
- 2020
- PDF
Principal Component Networks: Parameter Reduction Early in Training
- Computer Science, Mathematics
- ArXiv
- 2020
- PDF
Weight Distillation: Transferring the Knowledge in Neural Network Parameters
- Computer Science
- ArXiv
- 2020
- PDF
References
SHOWING 1-10 OF 78 REFERENCES
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
- Computer Science
- ArXiv
- 2019
- 608
- PDF
To prune, or not to prune: exploring the efficacy of pruning for model compression
- Computer Science, Mathematics
- ICLR
- 2018
- 312
- PDF
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour
- Computer Science
- ArXiv
- 2017
- 1,464
- Highly Influential
- PDF
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
- Computer Science
- ICLR
- 2020
- 992
- Highly Influential
- PDF
Budgeted Training: Rethinking Deep Neural Network Training Under Resource Constraints
- Computer Science
- ICLR
- 2020
- 9
- Highly Influential
- PDF