Character-Level Language Modeling with Deeper Self-Attention

@inproceedings{AlRfou2018CharacterLevelLM,
  title={Character-Level Language Modeling with Deeper Self-Attention},
  author={Rami Al-Rfou and Dokook Choe and Noah Constant and Mandy Guo and Llion Jones},
  booktitle={AAAI},
  year={2018}
}
LSTMs and other RNN variants have shown strong performance on character-level language modeling. These models are typically trained using truncated backpropagation through time, and it is common to assume that their success stems from their ability to remember long-term contexts. In this paper, we show that a deep (64-layer) transformer model with fixed context outperforms RNN variants by a large margin, achieving state of the art on two popular benchmarks: 1.13 bits per character on text8 and… CONTINUE READING

Citations

Publications citing this paper.
SHOWING 1-10 OF 53 CITATIONS

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

VIEW 13 EXCERPTS
CITES BACKGROUND, METHODS & RESULTS
HIGHLY INFLUENCED

TRANSFORMER-XL: LANGUAGE MODELING

  • 2018
VIEW 18 EXCERPTS
CITES BACKGROUND, METHODS & RESULTS
HIGHLY INFLUENCED

On Recognizing Texts of Arbitrary Shapes with 2D Self-Attention

Junyeop Lee, Sungrae Park, +3 authors Hwalsuk Lee
  • ArXiv
  • 2019
VIEW 5 EXCERPTS
CITES BACKGROUND & METHODS
HIGHLY INFLUENCED

Deep Equilibrium Models

VIEW 4 EXCERPTS
CITES BACKGROUND
HIGHLY INFLUENCED

Language Models are Unsupervised Multitask Learners

VIEW 4 EXCERPTS
CITES BACKGROUND
HIGHLY INFLUENCED

FILTER CITATIONS BY YEAR

2018
2019

CITATION STATISTICS

  • 11 Highly Influenced Citations

  • Averaged 27 Citations per year from 2018 through 2019