• Corpus ID: 208617790

# Plug and Play Language Models: A Simple Approach to Controlled Text Generation

@article{Dathathri2020PlugAP,
title={Plug and Play Language Models: A Simple Approach to Controlled Text Generation},
author={Sumanth Dathathri and Andrea Madotto and Janice Lan and Jane Hung and Eric Frank and Piero Molino and Jason Yosinski and Rosanne Liu},
journal={ArXiv},
year={2020},
volume={abs/1912.02164}
}
Large transformer-based language models (LMs) trained on huge text corpora have shown unparalleled generation capabilities. However, controlling attributes of the generated language (e.g. switching topic or sentiment) is difficult without modifying the model architecture or fine-tuning on attribute-specific data and entailing the significant cost of retraining. We propose a simple alternative: the Plug and Play Language Model (PPLM) for controllable language generation, which combines a…
206 Citations

## Figures, Tables, and Topics from this paper

A Plug-and-Play Method for Controlled Text Generation
• Computer Science
EMNLP
• 2021
This work presents a plug-and-play decoding method for controlled language generation that is so simple and intuitive, it can be described in a single sentence: given a topic or keyword, a shift to the probability distribution over vocabulary towards semantically similar words is added and shown how annealing this distribution can be used to impose hard constraints on language generation.
Attribute Alignment: Controlling Text Generation from Pre-trained Language Models
• Computer Science
EMNLP
• 2021
This work proposes a simple and flexible method for controlling text generation by aligning disentangled attribute representations, and shows large performance gains over previous methods while retaining fluency and diversity.
Sentence Bottleneck Autoencoders from Transformer Language Models
• Computer Science
EMNLP
• 2021
The construction of a sentence-level autoencoder from a pretrained, frozen transformer language model that achieves better quality than previous methods that extract representations from pretrained transformers on text similarity tasks, style transfer, and single-sentence classification tasks in the GLUE benchmark, while using fewer parameters than large pretrained models.
Directed Beam Search: Plug-and-Play Lexically Constrained Language Generation
• Computer Science
ArXiv
• 2020
Directed Beam Search is proposed, a plug-and-play method for lexically constrained language generation that can be applied to any language model, is easy to implement and can be used for general language generation.
Controlled Text Generation as Continuous Optimization with Multiple Constraints
• Computer Science
ArXiv
• 2021
This work forms the decoding process as an optimization problem which allows for multiple attributes to be easily incorporated as differentiable constraints to the optimization and makes use of Lagrangian multipliers and gradient-descent based techniques to generate the desired text.
Change or Not: A Simple Approach for Plug and Play Language Models on Sentiment Control
• Chen Xu
• Computer Science
AAAI
• 2021
PPLM (Dathathri et al. 2019) solves the conditional text generation problem without changing the architecture or weights of pre-trained LM but utilizing an external sentiment classifier to calculate loss, which is then backpropagated to the original LM’s hidden states at each time step.
SideControl: Controlled Open-domain Dialogue Generation via Additive Side Networks
• Computer Science
EMNLP
• 2021
A novel approach to control the generation of Transformer-based pretrained language models is proposed: the SIDECONTROL framework, which leverages a novel control attributes loss to incorporate useful control signals, and is shown to perform well with very limited training samples.
DExperts: Decoding-Time Controlled Text Generation with Experts and Anti-Experts
Because DEXPERTS operates only on the output of the pretrained LM, it is effective with (anti-)experts of smaller size, including when operating on GPT-3.
Neural Language Generation: Formulation, Methods, and Evaluation
• Computer Science
ArXiv
• 2020
There is no standard way to assess the quality of text produced by these generative models, which constitutes a serious bottleneck towards the progress of the field, so this survey will provide an informative overview of formulations, methods, and assessments of neural natural language generation.
Controllable Story Generation with External Knowledge Using Large-Scale Language Models
MEGATRON-CNTRL is a novel framework that uses large-scale language models and adds control to text generation by incorporating an external knowledge base and showcases the controllability of the model by replacing the keywords used to generate stories and re-running the generation process.

## References

SHOWING 1-10 OF 60 REFERENCES
CTRL: A Conditional Transformer Language Model for Controllable Generation
• Computer Science
ArXiv
• 2019
CTRL is released, a 1.63 billion-parameter conditional transformer language model, trained to condition on control codes that govern style, content, and task-specific behavior, providing more explicit control over text generation.
Can Unconditional Language Models Recover Arbitrary Sentences?
• Computer Science
NeurIPS
• 2019
This work introduces a pair of effective complementary methods for feeding representations into pretrained unconditional language models and a corresponding set of methods to map sentences into and out of this representation space, the \textit{reparametrized sentence space}.
Language Models are Unsupervised Multitask Learners
• Computer Science
• 2019
It is demonstrated that language models begin to learn these tasks without any explicit supervision when trained on a new dataset of millions of webpages called WebText, suggesting a promising path towards building language processing systems which learn to perform tasks from their naturally occurring demonstrations.
Multiple-Attribute Text Rewriting
• Computer Science
ICLR
• 2019
This paper proposes a new model that controls several factors of variation in textual data where this condition on disentanglement is replaced with a simpler mechanism based on back-translation, and demonstrates that the fully entangled model produces better generations.
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
• Computer Science
NAACL
• 2019
A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.
Simple Fusion: Return of the Language Model
• Computer Science
WMT
• 2018
This work investigates an alternative simple method to use monolingual data for NMT training that combines the scores of a pre-trained and fixed language model (LM) with the Scores of a translation model (TM) while the TM is trained from scratch.
Transformer-XL: Attentive Language Models beyond a Fixed-Length Context
• Computer Science, Mathematics
ACL
• 2019
This work proposes a novel neural architecture Transformer-XL that enables learning dependency beyond a fixed length without disrupting temporal coherence, which consists of a segment-level recurrence mechanism and a novel positional encoding scheme.
Controllable Text Generation
• Computer Science
ArXiv
• 2017
A new neural generative model is proposed which combines variational auto-encoders and holistic attribute discriminators for effective imposition of semantic structures inGeneric generation and manipulation of text.
Delete, Retrieve, Generate: a Simple Approach to Sentiment and Style Transfer
• Computer Science
NAACL
• 2018
This paper proposes simpler methods motivated by the observation that text attributes are often marked by distinctive phrases, and the strongest method extracts content words by deleting phrases associated with the sentence’s original attribute value, retrieves new phrases associatedwith the target attribute, and uses a neural model to fluently combine these into a final output.
Improving Language Understanding by Generative Pre-Training
• Computer Science
• 2018
The general task-agnostic model outperforms discriminatively trained models that use architectures specifically crafted for each task, significantly improving upon the state of the art in 9 out of the 12 tasks studied.