DExperts: Decoding-Time Controlled Text Generation with Experts and Anti-Experts

  title={DExperts: Decoding-Time Controlled Text Generation with Experts and Anti-Experts},
  author={Alisa Liu and Maarten Sap and Ximing Lu and Swabha Swayamdipta and Chandra Bhagavatula and Noah A. Smith and Yejin Choi},
Despite recent advances in natural language generation, it remains challenging to control attributes of generated text. We propose DExperts: Decoding-time Experts, a decoding-time method for controlled text generation that combines a pretrained language model with “expert” LMs and/or “anti-expert” LMs in a product of experts. Intuitively, under the ensemble, tokens only get high probability if they are considered likely by the experts, and unlikely by the anti-experts. We apply DExperts to… 

Mix and Match: Learning-free Controllable Text Generationusing Energy Language Models

This work proposes Mix and Match LM, a global score-based alternative for controllable text generation that combines arbitrary pre-trained black- box models for achieving the desired attributes in the generated text without involving any fine-tuning or structural assumptions about the black-box models.

Improving Controllable Text Generation with Position-Aware Weighted Decoding

A novel framework based on existing weighted decoding methods called CAT-PAW is proposed, which introduces a lightweight regulator to adjust bias signals from the controller at different decoding positions to solve the control strength/fluency trade-off problem.

A Survey of Controllable Text Generation using Transformer-based Pre-trained Language Models

This is the first survey paper to summarize CTG techniques from the perspective of PLMs, and it is hoped it can help researchers in related fields to quickly track the academic frontier, providing them with a landscape of the area and a roadmap for future research.

Classifiers are Better Experts for Controllable Text Generation

This paper proposes a simple method for controllable text generation based on weighting logits with a free-form classifier, namely CAIF sampling, and shows that the proposed method signi-icantly outper-forms recent PPLM, GeDi, and DExperts on PPL and task accuracy metrics based on the external classi-�er of generated texts.

Extracting Latent Steering Vectors from Pretrained Language Models

The results suggest that frozen LMs can be effectively controlled through their latent steering space, and it is found that distances between steering vectors reflect sentence similarity when evaluated on a textual similarity benchmark (STS-B), outperforming pooled hidden states of models.

Diffusion-LM Improves Controllable Text Generation

A new non-autoregressive language model based on continuous diffusions that iteratively denoises a sequence of Gaussian vectors into word vectors, yielding a sequences of intermediate latent variables that enables a simple gradient-based algorithm to perform complex, controllable generation tasks.

Controlled Text Generation as Continuous Optimization with Multiple Constraints

This work forms the decoding process as an optimization problem which allows for multiple attributes it aims to control to be easily incorporated as differentiable constraints to the optimization by relaxing this discrete optimization to a continuous one.

Why is constrained neural language generation particularly challenging?

An extensive survey on the emerging topic of constrained neural language generation is presented in which it is formally defined and categorize the problems of natural language generation by distinguishing between conditions and constraints, and existing methods and evaluation metrics for constrained text generation are reviewed.

Gamma Sampling: Fine-grained Controlling Language Models without Training

Gamma Sampling introduces attributerelated information (provided by humans or language models themselves) into the sampling process to guide language models to generate texts with desired attributes to achieve fine-grained controllable text generation while maintaining a fast generation speed.

Generating Training Data with Language Models: Towards Zero-Shot Language Understanding

This paper presents a simple approach that uses both types of PLMs for fully zero-shot learning of NLU tasks without requiring any taskspecific data: a unidirectional PLM generates class-conditioned texts guided by prompts, which are used as the training data for fine-tuning a bidirectionalPLM.



Plug and Play Language Models: A Simple Approach to Controlled Text Generation

The Plug and Play Language Model (PPLM) for controllable language generation is proposed, which combines a pretrained LM with one or more simple attribute classifiers that guide text generation without any further training of the LM.

GeDi: Generative Discriminator Guided Sequence Generation

GeDi is proposed as an efficient method for using smaller LMs as generative discriminators to guide generation from large LMs to make them safer and more controllable, and is found that GeDi gives stronger controllability than the state of the art method while also achieving generation speeds more than 30 times faster.

FUDGE: Controlled Text Generation With Future Discriminators

This work proposes Future Discriminators for Generation (FUDGE), a flexible and modular method for controlled text generation that enables conditioning on a desired attribute a while requiring access only to G’s output logits.

Neural Text Generation with Unlikelihood Training

It is shown that the likelihood objective itself is at fault, resulting in a model that assigns too much probability to sequences containing repeats and frequent words, unlike those from the human training distribution, thus providing a strong alternative to existing techniques.

CTRL: A Conditional Transformer Language Model for Controllable Generation

CTRL is released, a 1.63 billion-parameter conditional transformer language model, trained to condition on control codes that govern style, content, and task-specific behavior, providing more explicit control over text generation.

The Curious Case of Neural Text Degeneration

By sampling text from the dynamic nucleus of the probability distribution, which allows for diversity while effectively truncating the less reliable tail of the distribution, the resulting text better demonstrates the quality of human text, yielding enhanced diversity without sacrificing fluency and coherence.

BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension

BART is presented, a denoising autoencoder for pretraining sequence-to-sequence models, which matches the performance of RoBERTa on GLUE and SQuAD, and achieves new state-of-the-art results on a range of abstractive dialogue, question answering, and summarization tasks.

Delete, Retrieve, Generate: a Simple Approach to Sentiment and Style Transfer

This paper proposes simpler methods motivated by the observation that text attributes are often marked by distinctive phrases, and the strongest method extracts content words by deleting phrases associated with the sentence’s original attribute value, retrieves new phrases associatedwith the target attribute, and uses a neural model to fluently combine these into a final output.

Improving Language Understanding by Generative Pre-Training

The general task-agnostic model outperforms discriminatively trained models that use architectures specifically crafted for each task, improving upon the state of the art in 9 out of the 12 tasks studied.

Don’t Stop Pretraining: Adapt Language Models to Domains and Tasks

It is consistently found that multi-phase adaptive pretraining offers large gains in task performance, and it is shown that adapting to a task corpus augmented using simple data selection strategies is an effective alternative, especially when resources for domain-adaptive pretraining might be unavailable.