Learning to Write with Cooperative Discriminators

@article{Holtzman2018LearningTW,
  title={Learning to Write with Cooperative Discriminators},
  author={Ari Holtzman and Jan Buys and Maxwell Forbes and Antoine Bosselut and David Golub and Yejin Choi},
  journal={ArXiv},
  year={2018},
  volume={abs/1805.06087}
}
Recurrent Neural Networks (RNNs) are powerful autoregressive sequence models, but when used to generate natural language their output tends to be overly generic, repetitive, and self-contradictory. We postulate that the objective function optimized by RNN language models, which amounts to the overall perplexity of a text, is not expressive enough to capture the notion of communicative goals described by linguistic principles such as Grice's Maxims. We propose learning a mixture of multiple… Expand
Towards Coherent and Cohesive Long-form Text Generation
TLDR
This work proposes a new neural language model that is equipped with two neural discriminators which provide feedback signals at the levels of sentence (cohesion) and paragraph (coherence) which is trained using a simple yet efficient variant of policy gradient, called negative-critical sequence training. Expand
Attribute Alignment: Controlling Text Generation from Pre-trained Language Models
TLDR
This work proposes a simple and flexible method for controlling text generation by aligning disentangled attribute representations, and shows large performance gains over previous methods while retaining fluency and diversity. Expand
Neural Language Generation: Formulation, Methods, and Evaluation
TLDR
There is no standard way to assess the quality of text produced by these generative models, which constitutes a serious bottleneck towards the progress of the field, so this survey will provide an informative overview of formulations, methods, and assessments of neural natural language generation. Expand
Towards Generating Long and Coherent Text with Multi-Level Latent Variable Models
TLDR
Extensive experimental results demonstrate that the proposed multi-level VAE model produces more coherent and less repetitive long text compared to baselines as well as can mitigate the posterior-collapse issue. Expand
SDA: Improving Text Generation with Self Data Augmentation
TLDR
This paper proposes to improve the standard maximum likelihood estimation (MLE) paradigm by incorporating a self-imitation-learning phase for automatic data augmentation, which is more general and could be easily adapted to any MLE-based training procedure. Expand
A Cross-Domain Transferable Neural Coherence Model
TLDR
This work proposes a local discriminative neural model with a much smaller negative sampling space that can efficiently learn against incorrect orderings and significantly outperforms previous state-of-art methods on a standard benchmark dataset on the Wall Street Journal corpus, as well as in multiple new challenging settings of transfer to unseen categories of discourse on Wikipedia articles. Expand
On-the-Fly Attention Modularization for Neural Generation
TLDR
These findings motivate on-the-fly attention modularization, a simple but effective method for injecting inductive biases into attention computation during inference to yield enhanced diversity and commonsense reasoning while maintaining fluency and coherence. Expand
Plug and Play Language Models: A Simple Approach to Controlled Text Generation
TLDR
The Plug and Play Language Model (PPLM) for controllable language generation is proposed, which combines a pretrained LM with one or more simple attribute classifiers that guide text generation without any further training of the LM. Expand
The Curious Case of Neural Text Degeneration
TLDR
By sampling text from the dynamic nucleus of the probability distribution, which allows for diversity while effectively truncating the less reliable tail of the distribution, the resulting text better demonstrates the quality of human text, yielding enhanced diversity without sacrificing fluency and coherence. Expand
Neural Text Generation with Unlikelihood Training
TLDR
It is shown that the likelihood objective itself is at fault, resulting in a model that assigns too much probability to sequences containing repeats and frequent words, unlike those from the human training distribution, thus providing a strong alternative to existing techniques. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 53 REFERENCES
A Deep Reinforced Model for Abstractive Summarization
TLDR
A neural network model with a novel intra-attention that attends over the input and continuously generated output separately, and a new training method that combines standard supervised word prediction and reinforcement learning (RL) that produces higher quality summaries. Expand
Tying Word Vectors and Word Classifiers: A Loss Framework for Language Modeling
TLDR
This work introduces a novel theoretical framework that facilitates better learning in language modeling, and shows that this framework leads to tying together the input embedding and the output projection matrices, greatly reducing the number of trainable variables. Expand
A Diversity-Promoting Objective Function for Neural Conversation Models
TLDR
This work proposes using Maximum Mutual Information (MMI) as the objective function in neural models, and demonstrates that the proposed MMI models produce more diverse, interesting, and appropriate responses, yielding substantive gains in BLEU scores on two conversational datasets and in human evaluations. Expand
Regularizing and Optimizing LSTM Language Models
TLDR
This paper proposes the weight-dropped LSTM which uses DropConnect on hidden-to-hidden weights as a form of recurrent regularization and introduces NT-ASGD, a variant of the averaged stochastic gradient method, wherein the averaging trigger is determined using a non-monotonic condition as opposed to being tuned by the user. Expand
Abstractive Sentence Summarization with Attentive Recurrent Neural Networks
TLDR
A conditional recurrent neural network (RNN) which generates a summary of an input sentence which significantly outperforms the recently proposed state-of-the-art method on the Gigaword corpus while performing competitively on the DUC-2004 shared task. Expand
Exploring the Limits of Language Modeling
TLDR
This work explores recent advances in Recurrent Neural Networks for large scale Language Modeling, and extends current models to deal with two key challenges present in this task: corpora and vocabulary sizes, and complex, long term structure of language. Expand
The Neural Noisy Channel
TLDR
Experimental results on abstractive sentence summarisation, morphological inflection, and machine translation show that noisy channel models outperform direct models, and that they significantly benefit from increased amounts of unpaired output data that direct models cannot easily use. Expand
Challenges in Data-to-Document Generation
TLDR
A new, large-scale corpus of data records paired with descriptive documents is introduced, a series of extractive evaluation methods for analyzing performance are proposed, and baseline results are obtained using current neural generation methods. Expand
Generating High-Quality and Informative Conversation Responses with Sequence-to-Sequence Models
TLDR
This work focuses on the single turn setting, introduces a stochastic beam-search algorithm with segment-by-segment reranking which lets us inject diversity earlier in the generation process, and proposes a practical approach, called the glimpse-model, for scaling to large datasets. Expand
A Decomposable Attention Model for Natural Language Inference
We propose a simple neural architecture for natural language inference. Our approach uses attention to decompose the problem into subproblems that can be solved separately, thus making it triviallyExpand
...
1
2
3
4
5
...