• Corpus ID: 214714403

Variational Transformers for Diverse Response Generation

  title={Variational Transformers for Diverse Response Generation},
  author={Zhaojiang Lin and Genta Indra Winata and Peng Xu and Zihan Liu and Pascale Fung},
Despite the great promise of Transformers in many sequence modeling tasks (e.g., machine translation), their deterministic nature hinders them from generalizing to high entropy tasks such as dialogue response generation. Previous work proposes to capture the variability of dialogue responses with a recurrent neural network (RNN)-based conditional variational autoencoder (CVAE). However, the autoregressive computation of the RNN limits the training efficiency. Therefore, we propose the… 

Figures and Tables from this paper

A Randomized Link Transformer for Diverse Open-Domain Dialogue Generation

A Randomized Link (RL) Transformer is proposed as an alternative to the latent variable models and empirical results show that, when it comes to response diversity, the RL Transformer achieved comparable performance compared to latentVariable models.

Transformer-based Conditional Variational Autoencoder for Controllable Story Generation

This paper integrates latent representation vectors with a Transformer-based pre-trained architecture to build conditional variational autoencoder (CVAE), and demonstrates state-of-the-art conditional generation ability of the model, as well as its excellent representation learning capability and controllability.

Fuse It More Deeply! A Variational Transformer with Layer-Wise Latent Variable Inference for Text Generation

Della, a novel variational Transformer framework that learns a series of layer-wise latent variables with each inferred from those of lower layers and tightly coupled with the hidden states by low-rank tensor product, could better alleviate KL vanishing and improve both quality and diversity compared to several strong baselines.

Identifying Untrustworthy Samples: Data Filtering for Open-domain Dialogues with Bayesian Optimization

This paper presents a data filtering method for open-domain dialogues, which identifies untrustworthy samples from training data with a quality measure that linearly combines seven dialogue attributes, and proposes a training framework that integrates maximum likelihood estimation (MLE) and negative training method (NEG).

RVAE-LAMOL: Residual Variational Autoencoder to Enhance Lifelong Language Learning

The residual variational autoencoder (RVAE) is proposed to enhance LAMOL, a recent LLL model, by mapping different tasks into a limited unified semantic space and an identity task to make the model is discriminative to recognize the sample belonging to which task.

Text is NOT Enough: Integrating Visual Impressions into Open-domain Dialogue Generation

This paper proposes a framework to explicitly construct VIs based on pure-language dialogue datasets and utilize them for better dialogue understanding and generation, and shows that the proposed approach achieves superior performance over competitive baselines in terms of fluency, relatedness, and diversity.

A Variational AutoEncoder for Transformers with Nonparametric Variational Information Bottleneck

By regularising the cross-attention of a Transformer encoder-decoder with NVIB, this work proposes a nonparametric variational autoencoder (NVAE) and initial experiments show that the induced embedding space has the desired properties of a VAE for Transformers.

Variational Transformer Networks for Layout Generation

This work exploits the properties of self-attention layers to capture high level relationships between elements in a layout, and uses these as the building blocks of the well-known Variational Autoencoder (VAE) formulation.

Combining Variational Autoencoders and Transformer Language Models for Improved Password Generation

This work introduces a novel architecture that leverages the expressive power of transformers with the natural sampling approach to text generation of variational autoencoders, and shows how it generates state-of-the-art results in password matching performance across multiple benchmark datasets.

The Adapter-Bot: All-In-One Controllable Conversational Model

The Adapter-Bot is presented, a generative chat-bot that uses a fixed backbone conversational model such as DialGPT and triggers on-demand dialogue skills via different adapters that can be trained independently, thus allowing a continual integration of skills without retraining the entire model.



Variational Autoregressive Decoder for Neural Response Generation

A novel model is proposed that sequentially introduces a series of latent variables to condition the generation of each word in the response sequence and leads to significant improvement on both relevance and diversity over state-of-the-art baselines.

Learning Discourse-level Diversity for Neural Dialog Models using Conditional Variational Autoencoders

This work presents a novel framework based on conditional variational autoencoders that capture the discourse-level diversity in the encoder and uses latent variables to learn a distribution over potential conversational intents and generates diverse responses using only greedy decoders.

Generative Deep Neural Networks for Dialogue: A Short Review

Recently proposed models based on generative encoder-decoder neural network architectures are reviewed and it is shown that these models have better ability to incorporate long-term dialogue history, to model uncertainty and ambiguity in dialogue, and to generate responses with high-level compositional structure.

Universal Transformers

The Universal Transformer (UT), a parallel-in-time self-attentive recurrent sequence model which can be cast as a generalization of the Transformer model and which addresses issues of parallelizability and global receptive field, is proposed.

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.

A Diversity-Promoting Objective Function for Neural Conversation Models

This work proposes using Maximum Mutual Information (MMI) as the objective function in neural models, and demonstrates that the proposed MMI models produce more diverse, interesting, and appropriate responses, yielding substantive gains in BLEU scores on two conversational datasets and in human evaluations.

A Neural Conversational Model

A simple approach to conversational modeling which uses the recently proposed sequence to sequence framework, and is able to extract knowledge from both a domain specific dataset, and from a large, noisy, and general domain dataset of movie subtitles.

Attention is All you Need

A new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely is proposed, which generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.

Unsupervised Discrete Sentence Representation Learning for Interpretable Neural Dialog Generation

Two novel models are presented, DI-VAE and DI-VST, that improve VAEs and can discover interpretable semantics via either auto encoding or context predicting and enhance encoder-decoder models with interpretable generation.

Variational Neural Machine Translation

This paper builds a neural posterior approximator conditioned on both the source and the target sides, and equip it with a reparameterization technique to estimate the variational lower bound, and shows that the proposed variational neural machine translation achieves significant improvements over the vanilla neural machinetranslation baselines.