Corpus ID: 226281978

Long Range Arena: A Benchmark for Efficient Transformers

@article{Tay2020LongRA,
  title={Long Range Arena: A Benchmark for Efficient Transformers},
  author={Yi Tay and M. Dehghani and Samira Abnar and Y. Shen and Dara Bahri and Philip Pham and J. Rao and Liu Yang and Sebastian Ruder and Donald Metzler},
  journal={ArXiv},
  year={2020},
  volume={abs/2011.04006}
}
  • Yi Tay, M. Dehghani, +7 authors Donald Metzler
  • Published 2020
  • Computer Science
  • ArXiv
  • Transformers do not scale very well to long sequence lengths largely because of quadratic self-attention complexity. In the recent months, a wide spectrum of efficient, fast Transformers have been proposed to tackle this problem, more often than not claiming superior or comparable model quality to vanilla Transformer models. To this date, there is no well-established consensus on how to evaluate this class of models. Moreover, inconsistent benchmarking on a wide spectrum of tasks and datasets… CONTINUE READING
    3 Citations

    Figures and Tables from this paper

    Sub-Linear Memory: How to Make Performers SLiM
    • Highly Influenced
    • PDF
    Cross-Document Language Modeling
    • PDF

    References

    SHOWING 1-10 OF 49 REFERENCES
    Big Bird: Transformers for Longer Sequences
    • 46
    • PDF
    Generating Long Sequences with Sparse Transformers
    • 225
    • PDF
    Efficient Transformers: A Survey
    • 18
    • PDF
    Reformer: The Efficient Transformer
    • 183
    • PDF
    Longformer: The Long-Document Transformer
    • 128
    • PDF
    Linformer: Self-Attention with Linear Complexity
    • 29
    • PDF
    Do Transformers Need Deep Long-Range Memory?
    • 3
    • PDF
    Rethinking Attention with Performers
    • 13
    • PDF