Do Sequence-to-sequence VAEs Learn Global Features of Sentences?

  title={Do Sequence-to-sequence VAEs Learn Global Features of Sentences?},
  author={Tom Bosc and Pascal Vincent},
A longstanding goal in NLP is to compute global sentence representations. Such representations would be useful for sample-efficient semi-supervised learning and controllable text generation. To learn to represent global and local information separately, Bowman & al. (2016) proposed to train a sequence-to-sequence model with the variational auto-encoder (VAE) objective. What precisely is encoded in these latent variables expected to capture global features? We measure which words benefit most… 

Figures and Tables from this paper

Exploring Story Generation with Multi-task Objectives in Variational Autoencoders

Combining BERT and GPT-2 to build a variational autoencoder (VAE), and extend it by adding additional objectives to learn global features such as story topic and discourse relations are explored.

Enhancing Response Relevance and Emotional Consistency for Dialogue Response Generation

This work proposes to involve contrastive learning to generate positive and negative samples for training process, which enriches the latent variables representation with the global information of sentence and generates more relevant response.


A new generative model of text, Step-unrolled Denoising Autoencoder (SUNDAE), that does not rely on autoregressive models, that converges in fewer iterations than diffusion methods, while qualitatively producing better samples on natural language datasets.

On the Effect of Isotropy on VAE Representations of Text

It is illustrated that IGP effectively encourages isotropy in the representations, inducing a more discriminative latent space and translates into a much better classification performance, robustness to input perturbation, and generative behavior.

Step-unrolled Denoising Autoencoders for Text Generation

A new generative model of text, Step-unrolled Denoising Autoencoder (SUNDAE), that does not rely on autoregressive models, that converges in fewer iterations than diffusion methods, while qualitatively producing better samples on natural language datasets.

The Neglected Sibling: Isotropic Gaussian Posterior for VAE

This paper proposes a simple modification to Variational Autoencoders by using an Isotropic Gaussian Posterior (IGP) that allows for better utilisation of their latent representation space to avoid the sub-optimal behavior of VAEs related to inactive dimensions in the representation space.

Unsupervised Representation Disentanglement of Text: An Evaluation on Synthetic Datasets

This work is the first attempt on the intersection of unsupervised representation disentanglement and text, and provides the experimental framework and datasets for examining future developments in this direction.

On the Low-density Latent Regions of VAE-based Language Models

A simple hole-detection algorithm based on the neighbour consistency between VAE’s input, latent, and output semantic spaces is proposed, which implies that large-scale low-density latent holes may not exist in the latent space.



Sequence to Sequence Learning with Neural Networks

This paper presents a general end-to-end approach to sequence learning that makes minimal assumptions on the sequence structure, and finds that reversing the order of the words in all source sentences improved the LSTM's performance markedly, because doing so introduced many short term dependencies between the source and the target sentence which made the optimization problem easier.

Generating Sentences from a Continuous Space

This work introduces and study an RNN-based variational autoencoder generative model that incorporates distributed latent representations of entire sentences that allows it to explicitly model holistic properties of sentences such as style, topic, and high-level syntactic features.

Unsupervised Controllable Text Generation with Global Variation Discovery and Disentanglement

This work makes the first successful attempt to use VAEs to achieve controllable text generation without supervision by decomposing the latent space of the VAE into two parts: one incorporates structural constraints to capture dominant global variations implicitly present in the data; the other is unstructured and is used for the reconstruction of the source sentences.

Conditional Variational Autoencoder for Neural Machine Translation

This is the first reported conditional variational model for text that meaningfully utilizes the latent variable without weakening the translation model, and is extended with a co-attention mechanism motivated by Parikh et al. in the inference network.

Improved Variational Autoencoders for Text Modeling using Dilated Convolutions

It is shown that with the right decoder, VAE can outperform LSTM language models, and perplexity gains are demonstrated on two datasets, representing the first positive experimental result on the use VAE for generative modeling of text.

Unsupervised Abstractive Sentence Summarization using Length Controlled Variational Autoencoder

An unsupervised approach to summarize sentences in abstractive way using Variational Autoencoder, showing that shorter sentences can not beat a simple baseline but yield higher ROUGE scores than trying to reconstruct the whole sentence.

Language Models are Unsupervised Multitask Learners

It is demonstrated that language models begin to learn these tasks without any explicit supervision when trained on a new dataset of millions of webpages called WebText, suggesting a promising path towards building language processing systems which learn to perform tasks from their naturally occurring demonstrations.

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.

Cyclical Annealing Schedule: A Simple Approach to Mitigating KL Vanishing

A cyclical annealing schedule is proposed, which simply repeats the process of increasing \beta multiple times, and allows to learn more meaningful latent codes progressively by leveraging the results of previous learning cycles as warm re-restart.

Learning Discourse-level Diversity for Neural Dialog Models using Conditional Variational Autoencoders

This work presents a novel framework based on conditional variational autoencoders that capture the discourse-level diversity in the encoder and uses latent variables to learn a distribution over potential conversational intents and generates diverse responses using only greedy decoders.