Attention is All you Need

@inproceedings{Vaswani2017AttentionIA,
  title={Attention is All you Need},
  author={Ashish Vaswani and Noam Shazeer and Niki Parmar and Jakob Uszkoreit and Llion Jones and Aidan N. Gomez and Lukasz Kaiser and Illia Polosukhin},
  booktitle={NIPS},
  year={2017}
}
The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more… CONTINUE READING

Citations

Publications citing this paper.
SHOWING 1-10 OF 3,626 CITATIONS

Crowd Transformer Network

VIEW 16 EXCERPTS
CITES METHODS & BACKGROUND
HIGHLY INFLUENCED

Semi-Supervised Disfluency Detection

Feng Wang, Wei Chen, +3 authors Bo Xu
  • COLING
  • 2018
VIEW 13 EXCERPTS
CITES METHODS
HIGHLY INFLUENCED

A Call for Prudent Choice of Subword Merge Operations

VIEW 16 EXCERPTS
CITES METHODS & BACKGROUND
HIGHLY INFLUENCED

Almost Unsupervised Text to Speech and Automatic Speech Recognition

  • ICML
  • 2019
VIEW 11 EXCERPTS
CITES BACKGROUND & METHODS
HIGHLY INFLUENCED

FILTER CITATIONS BY YEAR

2016
2019

CITATION STATISTICS

  • 1,067 Highly Influenced Citations

  • Averaged 1,202 Citations per year from 2017 through 2019

  • 126% Increase in citations per year in 2019 over 2018