Corpus ID: 232092546

OmniNet: Omnidirectional Representations from Transformers

@article{Tay2021OmniNetOR,
  title={OmniNet: Omnidirectional Representations from Transformers},
  author={Yi Tay and M. Dehghani and V. Aribandi and Jai Gupta and Philip Pham and Zhen Qin and Dara Bahri and Da-Cheng Juan and Donald Metzler},
  journal={ArXiv},
  year={2021},
  volume={abs/2103.01075}
}
This paper proposes Omnidirectional Representations from Transformers (OMNINET). In OmniNet, instead of maintaining a strictly horizontal receptive field, each token is allowed to attend to all tokens in the entire network. This process can also be interpreted as a form of extreme or intensive attention mechanism that has the receptive field of the entire width and depth of the network. To this end, the omnidirectional attention is learned via a meta-learner, which is essentially another self… Expand
KVT: k-NN Attention for Boosting Vision Transformers

References

SHOWING 1-10 OF 38 REFERENCES
Linformer: Self-Attention with Linear Complexity
Very Deep Transformers for Neural Machine Translation
One billion word benchmark for measuring progress in statistical language modeling
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Attention is All you Need
Neural Machine Translation by Jointly Learning to Align and Translate
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Efficient Transformers: A Survey
Long Range Arena: A Benchmark for Efficient Transformers
...
1
2
3
4
...