Corpus ID: 202888986

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

  title={ALBERT: A Lite BERT for Self-supervised Learning of Language Representations},
  author={Zhenzhong Lan and Mingda Chen and Sebastian Goodman and Kevin Gimpel and Piyush Sharma and Radu Soricut},
  • Zhenzhong Lan, Mingda Chen, +3 authors Radu Soricut
  • Published 2020
  • Computer Science
  • ArXiv
  • Increasing model size when pretraining natural language representations often results in improved performance on downstream tasks. [...] Key Method Comprehensive empirical evidence shows that our proposed methods lead to models that scale much better compared to the original BERT. We also use a self-supervised loss that focuses on modeling inter-sentence coherence, and show it consistently helps downstream tasks with multi-sentence inputs. As a result, our best model establishes new state-of-the-art results on…Expand Abstract
    735 Citations
    MC-BERT: Efficient Language Pre-Training via a Meta Controller
    • 1
    • PDF
    Poor Man's BERT: Smaller and Faster Transformer Models
    • 12
    • PDF
    DeBERTa: Decoding-enhanced BERT with Disentangled Attention
    • 5
    • PDF
    ConvBERT: Improving BERT with Span-based Dynamic Convolution
    • 1
    • PDF
    TinyBERT: Distilling BERT for Natural Language Understanding
    • 122
    • Highly Influenced
    • PDF
    Structured Pruning of Large Language Models
    • 17
    • PDF
    Fast and Accurate Deep Bidirectional Language Representations for Unsupervised Learning
    SQuAD 2.0 Based on ALBERT and Ensemble


    BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
    • 11,783
    • Highly Influential
    • PDF
    RoBERTa: A Robustly Optimized BERT Pretraining Approach
    • 1,772
    • PDF
    Efficient Training of BERT by Progressively Stacking
    • 16
    • PDF
    XLNet: Generalized Autoregressive Pretraining for Language Understanding
    • 1,638
    • Highly Influential
    • PDF
    Language Models are Unsupervised Multitask Learners
    • 1,986
    • PDF
    Adaptive Input Representations for Neural Language Modeling
    • 97
    • PDF