Corpus ID: 229924221

Shortformer: Better Language Modeling using Shorter Inputs

@article{Press2020ShortformerBL,
  title={Shortformer: Better Language Modeling using Shorter Inputs},
  author={Ofir Press and Noah A. Smith and M. Lewis},
  journal={ArXiv},
  year={2020},
  volume={abs/2012.15832}
}
We explore the benefits of decreasing the input length of transformers. First, we show that initially training the model on short subsequences, before moving on to longer ones, both reduces overall training time and, surprisingly, gives a large improvement in perplexity. We then show how to improve the efficiency of recurrence methods in transformers, which let models condition on previously processed tokens (when generating sequences that are larger than the maximal length that the transformer… Expand
5 Citations

Figures and Tables from this paper

Finetuning Pretrained Transformers into RNNs
  • Jungo Kasai, Hao Peng, +6 authors Noah A. Smith
  • Computer Science
  • ArXiv
  • 2021
  • Highly Influenced
  • PDF
EfficientNetV2: Smaller Models and Faster Training
  • 1
  • PDF
Position Information in Transformers: An Overview
  • Highly Influenced
  • PDF
A Token-level Reference-free Hallucination Detection Benchmark for Free-form Text Generation
  • Tianyu Liu, Yizhe Zhang, +4 authors Bill Dolan
  • Computer Science
  • 2021
  • PDF
Go Forth and Prosper: Language Modeling with Ancient Textual History
  • PDF

References

SHOWING 1-10 OF 23 REFERENCES
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
  • 940
  • Highly Influential
  • PDF
Generating Long Sequences with Sparse Transformers
  • 286
  • PDF
Reformer: The Efficient Transformer
  • 244
  • PDF
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
  • 15,915
  • PDF
Improving Transformer Models by Reordering their Sublayers
  • 15
  • PDF
Using the Output Embedding to Improve Language Models
  • 469
  • PDF
Adaptive Input Representations for Neural Language Modeling
  • 126
  • PDF
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
  • 1,058
  • PDF
Rethinking Positional Encoding in Language Pre-training
  • 15
  • PDF
Longformer: The Long-Document Transformer
  • 198
  • PDF
...
1
2
3
...