Corpus ID: 215737171

Longformer: The Long-Document Transformer

@article{Beltagy2020LongformerTL,
  title={Longformer: The Long-Document Transformer},
  author={Iz Beltagy and Matthew E. Peters and Arman Cohan},
  journal={ArXiv},
  year={2020},
  volume={abs/2004.05150}
}
Transformer-based models are unable to process long sequences due to their self-attention operation, which scales quadratically with the sequence length. To address this limitation, we introduce the Longformer with an attention mechanism that scales linearly with sequence length, making it easy to process documents of thousands of tokens or longer. Longformer's attention mechanism is a drop-in replacement for the standard self-attention and combines a local windowed attention with a task… Expand
Memformer: The Memory-Augmented Transformer
Linformer: Self-Attention with Linear Complexity
Random Feature Attention
Memory Transformer
Hierarchical Learning for Generation with Long Source Sequences
Learning Hard Retrieval Cross Attention for Transformer
A Practical Survey on Faster and Lighter Transformers
Finetuning Pretrained Transformers into RNNs
Lifting Sequence Length Limitations of NLP Models using Autoencoders
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 59 REFERENCES
Generating Long Sequences with Sparse Transformers
Attention is All you Need
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
BP-Transformer: Modelling Long-Range Context via Binary Partitioning
Pay Less Attention with Lightweight and Dynamic Convolutions
ETC: Encoding Long and Structured Inputs in Transformers
Sequence to Sequence Learning with Neural Networks
Compressive Transformers for Long-Range Sequence Modelling
...
1
2
3
4
5
...