Corpus ID: 13756489

Attention is All you Need

@article{Vaswani2017AttentionIA,
  title={Attention is All you Need},
  author={Ashish Vaswani and Noam M. Shazeer and Niki Parmar and Jakob Uszkoreit and Llion Jones and Aidan N. Gomez and Lukasz Kaiser and Illia Polosukhin},
  journal={ArXiv},
  year={2017},
  volume={abs/1706.03762}
}
The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. [...] Key Result We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.Expand
Weighted Transformer Network for Machine Translation
How Much Attention Do You Need? A Granular Analysis of Neural Machine Translation Architectures
Transformer++
A Simple but Effective Way to Improve the Performance of RNN-Based Encoder in Neural Machine Translation Task
Joint Source-Target Self Attention with Locality Constraints
Self-Attention and Dynamic Convolution Hybrid Model for Neural Machine Translation
An Analysis of Encoder Representations in Transformer-Based Machine Translation
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 42 REFERENCES
Sequence to Sequence Learning with Neural Networks
A Deep Reinforced Model for Abstractive Summarization
Can Active Memory Replace Attention?
End-To-End Memory Networks
Structured Attention Networks
Convolutional Sequence to Sequence Learning
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
...
1
2
3
4
5
...