ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
@article{Lan2020ALBERTAL, title={ALBERT: A Lite BERT for Self-supervised Learning of Language Representations}, author={Zhenzhong Lan and Mingda Chen and Sebastian Goodman and Kevin Gimpel and Piyush Sharma and Radu Soricut}, journal={ArXiv}, year={2020}, volume={abs/1909.11942} }
Increasing model size when pretraining natural language representations often results in improved performance on downstream tasks. [...] Key Method Comprehensive empirical evidence shows that our proposed methods lead to models that scale much better compared to the original BERT. We also use a self-supervised loss that focuses on modeling inter-sentence coherence, and show it consistently helps downstream tasks with multi-sentence inputs. As a result, our best model establishes new state-of-the-art results on…Expand Abstract
Supplemental Code
Github Repo
Via Papers with Code
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
Github Repo
Via Papers with Code
ALBERT model Pretraining and Fine Tuning using TF2.0
Figures, Tables, and Topics from this paper
Paper Mentions
News Article
Blog Post
News Article
News Article
905 Citations
Undivided Attention: Are Intermediate Layers Necessary for BERT?
- Computer Science
- ArXiv
- 2020
- Highly Influenced
- PDF
TinyBERT: Distilling BERT for Natural Language Understanding
- Computer Science
- EMNLP
- 2020
- 163
- Highly Influenced
- PDF
Fast and Accurate Deep Bidirectional Language Representations for Unsupervised Learning
- Computer Science
- ACL
- 2020
- Highly Influenced
- PDF
BURT: BERT-inspired Universal Representation from Learning Meaningful Segment
- Computer Science
- ArXiv
- 2020
- PDF
References
SHOWING 1-10 OF 66 REFERENCES
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
- Computer Science
- NAACL-HLT
- 2019
- 13,834
- Highly Influential
- PDF
Well-Read Students Learn Better: On the Importance of Pre-training Compact Models
- Computer Science
- 2019
- 65
StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding
- Computer Science
- ICLR
- 2020
- 51
- PDF
Well-Read Students Learn Better: The Impact of Student Initialization on Knowledge Distillation
- Computer Science
- ArXiv
- 2019
- 41
- PDF
XLNet: Generalized Autoregressive Pretraining for Language Understanding
- Computer Science
- NeurIPS
- 2019
- 1,934
- Highly Influential
- PDF