Self-Attention with Relative Position Representations

@inproceedings{Shaw2018SelfAttentionWR,
  title={Self-Attention with Relative Position Representations},
  author={Peter Shaw and Jakob Uszkoreit and Ashish Vaswani},
  booktitle={NAACL-HLT},
  year={2018}
}
Relying entirely on an attention mechanism, the Transformer introduced by Vaswani et al. (2017) achieves state-of-the-art results for machine translation. In contrast to recurrent and convolutional neural networks, it does not explicitly model relative or absolute position information in its structure. Instead, it requires adding representations of absolute positions to its inputs. In this work we present an alternative approach, extending the self-attention mechanism to efficiently consider… Expand
543 Citations
Self-Attention with Structural Position Representations
Improve Transformer Models with Better Relative Position Embeddings
An Improved Relative Self-Attention Mechanism for Transformer with Application to Music Generation
Self-Attention and Dynamic Convolution Hybrid Model for Neural Machine Translation
Assessing the Ability of Self-Attention Networks to Learn Word Order
Joint Source-Target Self Attention with Locality Constraints
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 17 REFERENCES
Effective Approaches to Attention-based Neural Machine Translation
Attention is All you Need
Graph Attention Networks
Rethinking the Inception Architecture for Computer Vision
Sequence to Sequence Learning with Neural Networks
Neural Machine Translation in Linear Time
Neural Machine Translation by Jointly Learning to Align and Translate
Convolutional Sequence to Sequence Learning
...
1
2
...