Attention Is All You Need

@inproceedings{Vaswani2017AttentionIA,
  title={Attention Is All You Need},
  author={Ashish Vaswani and Noam Shazeer and Niki Parmar and Jakob Uszkoreit and Llion Jones and Aidan N. Gomez and Lukasz Kaiser and Illia Polosukhin},
  booktitle={NIPS},
  year={2017}
}
The dominant sequence transduction models are based on complex recurrent or convolutional neural networks that include an encoder and a decoder. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being… CONTINUE READING

Results and Topics from this paper.

Key Quantitative Results

  • On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.2 after training for 4.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature.

Citations

Publications citing this paper.
SHOWING 1-10 OF 1,829 CITATIONS, ESTIMATED 40% COVERAGE

Crowd Transformer Network

VIEW 16 EXCERPTS
CITES METHODS, BACKGROUND & RESULTS
HIGHLY INFLUENCED

Semi-Supervised Disfluency Detection

Feng Wang, Wei Chen, +3 authors Bo Xu
  • COLING
  • 2018
VIEW 13 EXCERPTS
CITES BACKGROUND & METHODS
HIGHLY INFLUENCED

A Call for Prudent Choice of Subword Merge Operations

VIEW 16 EXCERPTS
CITES BACKGROUND & METHODS
HIGHLY INFLUENCED

Adversarial Stability Training in Neural Machine Translation of Chinese-to-English Text

Mandy Lu, Kaylie Zhu
  • 2019
VIEW 6 EXCERPTS
CITES METHODS
HIGHLY INFLUENCED

Almost Unsupervised Text to Speech and Automatic Speech Recognition

  • ICML
  • 2019
VIEW 11 EXCERPTS
CITES BACKGROUND & METHODS
HIGHLY INFLUENCED

Area Attention

VIEW 14 EXCERPTS
CITES RESULTS, BACKGROUND & METHODS
HIGHLY INFLUENCED

Attention, Learn to Solve Routing Problems!

VIEW 14 EXCERPTS
CITES BACKGROUND & METHODS
HIGHLY INFLUENCED

Character-Level Language Modeling with Deeper Self-Attention

VIEW 7 EXCERPTS
CITES METHODS & BACKGROUND
HIGHLY INFLUENCED

Corpora Generation for Grammatical Error Correction

VIEW 5 EXCERPTS
CITES METHODS
HIGHLY INFLUENCED

FILTER CITATIONS BY YEAR

2016
2019

CITATION STATISTICS

  • 529 Highly Influenced Citations

  • Averaged 356 Citations per year over the last 3 years

  • 93% Increase in citations per year in 2018 over 2017

References

Publications referenced by this paper.

Similar Papers

Loading similar papers…