Share This Author
Attention is All you Need
A new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely is proposed, which generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.
Relational inductive biases, deep learning, and graph networks
It is argued that combinatorial generalization must be a top priority for AI to achieve human-like abilities, and that structured representations and computations are key to realizing this objective.
Self-Attention with Relative Position Representations
This work presents an alternative approach, extending the self-attention mechanism to efficiently consider representations of the relative positions, or distances between sequence elements, on the WMT 2014 English-to-German and English- to-French translation tasks.
Learning Whom to Trust with MACE
MACE (Multi-Annotator Competence Estimation) learns in an unsupervised fashion to identify which annotators are trustworthy and predict the correct underlying labels, and shows considerable improvements over standard baselines, both for predicted label accuracy and trustworthiness estimates.
Stand-Alone Self-Attention in Vision Models
- Prajit Ramachandran, Niki Parmar, Ashish Vaswani, Irwan Bello, Anselm Levskaya, Jonathon Shlens
- Computer ScienceNeurIPS
- 13 June 2019
The results establish that stand-alone self-attention is an important addition to the vision practitioner's toolbox and is especially impactful when used in later layers.
Attention Augmented Convolutional Networks
- Irwan Bello, Barret Zoph, Ashish Vaswani, Jonathon Shlens, Quoc V. Le
- Computer ScienceIEEE/CVF International Conference on Computer…
- 22 April 2019
It is found that Attention Augmentation leads to consistent improvements in image classification on ImageNet and object detection on COCO across many different models and scales, including ResNets and a state-of-the art mobile constrained network, while keeping the number of parameters similar.
Tensor2Tensor for Neural Machine Translation
Tensor2Tensor is a library for deep learning models that is well-suited for neural machine translation and includes the reference implementation of the state-of-the-art Transformer model.
This work generalizes a recently proposed model architecture based on self-attention, the Transformer, to a sequence modeling formulation of image generation with a tractable likelihood, and significantly increases the size of images the model can process in practice, despite maintaining significantly larger receptive fields per layer than typical convolutional neural networks.
Decoding with Large-Scale Neural Language Models Improves Translation
This work develops a new model that combines the neural probabilistic language model of Bengio et al., rectified linear units, and noise-contrastive estimation, and incorporates it into a machine translation system both by reranking k-best lists and by direct integration into the decoder.
Music Transformer: Generating Music with Long-Term Structure
It is demonstrated that a Transformer with the modified relative attention mechanism can generate minutelong compositions with compelling structure, generate continuations that coherently elaborate on a given motif, and in a seq2seq setup generate accompaniments conditioned on melodies.