• Publications
  • Influence
A Neural Attention Model for Abstractive Sentence Summarization
TLDR
In this work, we propose a fully data-driven approach to abstractive sentence summarization. Expand
  • 1,682
  • 219
  • PDF
OpenNMT: Open-Source Toolkit for Neural Machine Translation
TLDR
We describe an open-source toolkit for neural machine translation (NMT) that prioritizes efficiency, modularity, and extensibility. Expand
  • 1,105
  • 169
  • PDF
Character-Aware Neural Language Models
TLDR
We describe a simple neural language model that relies only on character-level inputs. Expand
  • 1,265
  • 149
  • PDF
Abstractive Sentence Summarization with Attentive Recurrent Neural Networks
TLDR
We introduce a novel convolutional attention-based conditional recurrent neural network model for the problem of abstractive sentence summarization. Expand
  • 549
  • 69
  • PDF
Challenges in Data-to-Document Generation
TLDR
We introduce a new, large-scale corpus of data records paired with descriptive documents, propose a series of extractive evaluation methods for analyzing performance, and obtain baseline results using current neural generation methods. Expand
  • 231
  • 64
  • PDF
Bottom-Up Abstractive Summarization
TLDR
This work explores the use of data-efficient content selectors to over-determine phrases in a source document that should be part of the summary. Expand
  • 276
  • 54
  • PDF
Adversarially Regularized Autoencoders
TLDR
We propose a flexible method for training deep latent variable models of discrete structures. Expand
  • 162
  • 46
  • PDF
Sequence-to-Sequence Learning as Beam-Search Optimization
TLDR
In this work, we introduce a model and beam-search training scheme, based on the work of Daume III and Marcu (2005), that extends seq2seq to learn global sequence scores. Expand
  • 346
  • 27
  • PDF
Sequence-Level Knowledge Distillation
TLDR
We demonstrate that standard knowledge distillation applied to word-level prediction can be effective for NMT, and also introduce two novel sequence-level versions ofknowledge distillation that further improve performance, and somewhat surprisingly, seem to eliminate the need for beam search. Expand
  • 286
  • 27
  • PDF
Dual Decomposition for Parsing with Non-Projective Head Automata
TLDR
This paper introduces algorithms for non-projective parsing based on dual decomposition. Expand
  • 184
  • 26
  • PDF