• Publications
  • Influence
Neural Text Generation with Unlikelihood Training
TLDR
We show that the likelihood objective itself is at fault, resulting in a model that assigns too much probability to sequences containing repeats and frequent words, unlike those from the human training distribution. Expand
  • 67
  • 11
  • PDF
Dialogue Natural Language Inference
TLDR
We propose a method which demonstrates that a model trained on Dialogue NLI can be used to improve the consistency of dialogue model, and evaluate the method with human evaluation and with automatic metrics on a suite of evaluation sets designed to measure a dialogue model's consistency. Expand
  • 44
  • 10
  • PDF
Non-Monotonic Sequential Text Generation
TLDR
We propose a framework for training models of text generation that operate in nonmonotonic orders; the model directly learns good orders, without any additional annotation. Expand
  • 53
  • 6
  • PDF
Don't Say That! Making Inconsistent Dialogue Unlikely with Unlikelihood Training
TLDR
In this work we show how all of these problems can be addressed by extending the recently introduced unlikelihood loss (Welleck et al., 2019) to these cases. Expand
  • 19
  • 3
  • PDF
Consistency of a Recurrent Language Model With Respect to Incomplete Decoding
TLDR
In this paper, we show that the distribution induced by a decoding algorithm can contradict this intended use; instead, the decoding algorithm may return improbable, infinite-length sequences. Expand
  • 5
  • 1
  • PDF
Appendix : Non-Monotonic Sequential Text Generation
Oracle For π∗ annealed, β is linearly annealed from 1.0 to 0.0 at a rate of 0.05 each epoch, after a burn-in period of 20 epochs in which β is not decreased. We use greedy decoding when π∗ coachingExpand
  • 1
  • 1
  • PDF
Saliency-based Sequential Image Attention with Multiset Prediction
TLDR
We propose a hierarchical visual architecture that operates on a saliency map and uses a novel attention mechanism to sequentially focus on salient regions and take additional glimpses within those regions. Expand
  • 15
  • PDF
Loss Functions for Multiset Prediction
We study the problem of multiset prediction. The goal of multiset prediction is to train a predictor that maps an input to a multiset consisting of multiple items. Unlike existing problems inExpand
  • 10
  • PDF
MLE-guided parameter search for task loss minimization in neural sequence modeling
TLDR
We propose maximum likelihood guided parameter search (MGS), which samples from a distribution over update directions that is a mixture of random search around current parameters and around the maximum likelihood gradient, with each direction weighted by its improvement in the task loss. Expand
  • 1
  • PDF
Sequential Graph Dependency Parser
TLDR
We propose a method for non-projective dependency parsing by incrementally predicting a set of edges. Expand