Corpus ID: 231847231

Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention

@article{Xiong2021NystrmformerAN,
  title={Nystr{\"o}mformer: A Nystr{\"o}m-Based Algorithm for Approximating Self-Attention},
  author={Yunyang Xiong and Zhanpeng Zeng and Rudrasis Chakraborty and Mingxing Tan and Glenn M. Fung and Yin Li and V. Singh},
  journal={ArXiv},
  year={2021},
  volume={abs/2102.03902}
}
Transformers have emerged as a powerful tool for a broad range of natural language processing tasks. A key component that drives the impressive performance of Transformers is the self-attention mechanism that encodes the influence or dependence of other tokens on each specific token. While beneficial, the quadratic complexity of self-attention on the input sequence length has limited its application to longer sequences – a topic being actively studied in the community. To address this… Expand
3 Citations

Figures and Tables from this paper

A Practical Survey on Faster and Lighter Transformers
  • Highly Influenced
  • PDF
U-Net Transformer: Self and Cross Attention for Medical Image Segmentation
  • PDF

References

SHOWING 1-10 OF 50 REFERENCES
Longformer: The Long-Document Transformer
  • 198
  • Highly Influential
  • PDF
Fast Transformers with Clustered Attention
  • 8
  • PDF
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
  • 1,126
  • PDF
Generating Long Sequences with Sparse Transformers
  • 286
  • PDF
Are Sixteen Heads Really Better than One?
  • 187
  • PDF
Attention is All you Need
  • 17,096
  • PDF
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
  • 711
  • PDF
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
  • 581
  • PDF
...
1
2
3
4
5
...