Corpus ID: 13756489

Attention is All you Need

@inproceedings{Vaswani2017AttentionIA,
  title={Attention is All you Need},
  author={Ashish Vaswani and Noam Shazeer and Niki Parmar and Jakob Uszkoreit and Llion Jones and Aidan N. Gomez and L. Kaiser and Illia Polosukhin},
  booktitle={NIPS},
  year={2017}
}
  • Ashish Vaswani, Noam Shazeer, +5 authors Illia Polosukhin
  • Published in NIPS 2017
  • Computer Science
  • The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. [...] Key Result We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.Expand Abstract
    15,021 Citations
    Weighted Transformer Network for Machine Translation
    • 78
    • Highly Influenced
    • PDF
    How Much Attention Do You Need? A Granular Analysis of Neural Machine Translation Architectures
    • 43
    • Highly Influenced
    • PDF
    Transformer++
    • Highly Influenced
    • PDF
    A Simple but Effective Way to Improve the Performance of RNN-Based Encoder in Neural Machine Translation Task
    Joint Source-Target Self Attention with Locality Constraints
    • 12
    • Highly Influenced
    • PDF
    Self-Attention and Dynamic Convolution Hybrid Model for Neural Machine Translation
    • Highly Influenced
    An Analysis of Encoder Representations in Transformer-Based Machine Translation
    • 97
    • Highly Influenced
    • PDF

    References

    SHOWING 1-10 OF 42 REFERENCES
    Deep Recurrent Models with Fast-Forward Connections for Neural Machine Translation
    • 152
    • PDF
    Sequence to Sequence Learning with Neural Networks
    • 11,464
    • PDF
    A Deep Reinforced Model for Abstractive Summarization
    • 727
    • PDF
    Can Active Memory Replace Attention?
    • 41
    • PDF
    End-To-End Memory Networks
    • 1,621
    • PDF
    Structured Attention Networks
    • 230
    • PDF
    Multi-task Sequence to Sequence Learning
    • 577
    • PDF
    Convolutional Sequence to Sequence Learning
    • 1,767
    • Highly Influential
    • PDF
    Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
    • 539
    • PDF