• Publications
  • Influence
Convolutional Sequence to Sequence Learning
TLDR
This work introduces an architecture based entirely on convolutional neural networks, which outperform the accuracy of the deep LSTM setup of Wu et al. (2016) on both WMT'14 English-German and WMT-French translation at an order of magnitude faster speed, both on GPU and CPU.
Language Modeling with Gated Convolutional Networks
TLDR
A finite context approach through stacked convolutions, which can be more efficient since they allow parallelization over sequential tokens, is developed and is the first time a non-recurrent approach is competitive with strong recurrent models on these large scale language tasks.
fairseq: A Fast, Extensible Toolkit for Sequence Modeling
TLDR
Fairseq is an open-source sequence modeling toolkit that allows researchers and developers to train custom models for translation, summarization, language modeling, and other text generation tasks and supports distributed training across multiple GPUs and machines.
Understanding Back-Translation at Scale
TLDR
This work broadens the understanding of back-translation and investigates a number of methods to generate synthetic source sentences, finding that in all but resource poor settings back-translations obtained via sampling or noised beam outputs are most effective.
3D Human Pose Estimation in Video With Temporal Convolutions and Semi-Supervised Training
In this work, we demonstrate that 3D poses in video can be effectively estimated with a fully convolutional model based on dilated temporal convolutions over 2D keypoints. We also introduce
Neural Text Generation from Structured Data with Application to the Biography Domain
TLDR
A neural model for concept-to-text generation that scales to large, rich domains and significantly out-performs a classical Kneser-Ney language model adapted to this task by nearly 15 BLEU is introduced.
Scaling Neural Machine Translation
TLDR
This paper shows that reduced precision and large batch training can speedup training by nearly 5x on a single 8-GPU machine with careful tuning and implementation.
Label Embedding Trees for Large Multi-Class Tasks
TLDR
An algorithm for learning a tree-structure of classifiers which, by optimizing the overall tree loss, provides superior accuracy to existing tree labeling methods and a method that learns to embed labels in a low dimensional space that is faster than non-embedding approaches and has superior accuracyto existing embedding approaches are proposed.
A Convolutional Encoder Model for Neural Machine Translation
TLDR
A faster and simpler architecture based on a succession of convolutional layers that allows to encode the source sentence simultaneously compared to recurrent networks for which computation is constrained by temporal dependencies is presented.
Efficient softmax approximation for GPUs
TLDR
This work proposes an approximate strategy to efficiently train neural network based language models over very large vocabularies by exploiting the unbalanced word distribution to form clusters that explicitly minimize the expectation of computational complexity.
...
...