• Publications
  • Influence
Convolutional Sequence to Sequence Learning
TLDR
The prevalent approach to sequence to sequence learning maps an input sequence to a variable length output sequence via recurrent neural networks. Expand
  • 1,773
  • 247
  • PDF
Sequence Level Training with Recurrent Neural Networks
TLDR
We propose a novel sequence level training algorithm that directly optimizes the metric used at test time, such as BLEU or ROUGE. Expand
  • 925
  • 144
  • PDF
Language Modeling with Gated Convolutional Networks
TLDR
In this paper we develop a finite context approach through stacked convolutions, which can be more efficient since they allow parallelization over sequential tokens. Expand
  • 885
  • 135
  • PDF
fairseq: A Fast, Extensible Toolkit for Sequence Modeling
TLDR
We present FAIRSEQ, an open-source sequence modeling toolkit written in PyTorch that is fast, extensible, and useful for both research and production. Expand
  • 617
  • 70
  • PDF
A Neural Network Approach to Context-Sensitive Generation of Conversational Responses
TLDR
We present a novel response generation system that can be trained end to end on large quantities of unstructured Twitter conversations. Expand
  • 684
  • 68
  • PDF
Abstractive Sentence Summarization with Attentive Recurrent Neural Networks
TLDR
We introduce a novel convolutional attention-based conditional recurrent neural network model for the problem of abstractive sentence summarization. Expand
  • 535
  • 66
  • PDF
Neural Text Generation from Structured Data with Application to the Biography Domain
TLDR
This paper introduces a neural model for concept-to-text generation that scales to large and very diverse problem of generating biographies based on Wikipedia infoboxes. Expand
  • 213
  • 61
  • PDF
Understanding Back-Translation at Scale
TLDR
An effective method to improve neural machine translation with monolingual data is to augment the parallel training corpus with back-translations of target language sentences. Expand
  • 363
  • 58
  • PDF
Pay Less Attention with Lightweight and Dynamic Convolutions
TLDR
We introduce dynamic convolutions which are simpler and more efficient than self-attention. Expand
  • 211
  • 48
  • PDF
Scaling Neural Machine Translation
TLDR
This paper shows that reduced precision and large batch training can speedup training by nearly 5x on a single 8-GPU machine with careful tuning and implementation. Expand
  • 274
  • 47
  • PDF