Deep Generative Modelling: A Comparative Review of VAEs, GANs, Normalizing Flows, Energy-Based and Autoregressive Models

  title={Deep Generative Modelling: A Comparative Review of VAEs, GANs, Normalizing Flows, Energy-Based and Autoregressive Models},
  author={Sam Bond-Taylor and Adam Leach and Yang Long and Chris G. Willcocks},
  journal={IEEE transactions on pattern analysis and machine intelligence},
Deep generative models are a class of techniques that train deep neural networks to model the distribution of training samples. Research has fragmented into various interconnected approaches, each of which make trade-offs including run-time, diversity, and architectural restrictions. In particular, this compendium covers energy-based models, variational autoencoders, generative adversarial networks, autoregressive models, normalizing flows, in addition to numerous hybrid approaches. These… 

Figures and Tables from this paper

The deep generative decoder: Using MAP estimates of representations
This work argues that it is worthwhile to investigate a much simpler approximation which finds representations and their distribution by maximizing the model likelihood via back-propagation, and calls it a Deep Generative Decoder (DGD).
A Simple Generative Network
This paper demonstrates that a very simple architecture (denoted SGN for its simplicity) is able to generate samples visually and quantitatively competitive as compared with the fore-mentioned state of the art methods.
The Devil is in the GAN: Defending Deep Generative Models Against Backdoor Attacks
Novel training-time attacks resulting in corrupted DGMs that synthesize regular data under normal operations and designated target outputs for inputs sampled from a trigger distribution are described, which allow adversaries to potentially undermine the integrity of entire machine learning development pipelines in a victim organization.
Chunked Autoregressive GAN for Conditional Waveform Synthesis
This paper's proposed model, Chunked Autoregressive GAN (CARGAN) reduces pitch error by 40-60%, reduces training time by 58%, maintains a fast generation speed suitable for realtime or interactive applications, and maintains or improves subjective quality.
An Overview of Variational Autoencoders for Source Separation, Finance, and Bio-Signal Applications
Applications of variational autoencoders for finance, speech/audio source separation, and biosignal applications are presented, and possible areas of research in improving performance of VAEs in particular and deep generative models in general are identified.
Pros and Cons of GAN Evaluation Measures: New Developments
  • A. Borji
  • Computer Science
    Comput. Vis. Image Underst.
  • 2022
An Information-Theoretic Perspective on Proper Quaternion Variational Autoencoders
This paper analyze the QVAE under an information-theoretic perspective, studying the ability of the H-proper model to approximate improper distributions as well as the built-in H- Proper ones and the loss of entropy due to the improperness of the input signal.
Validation Methods for Energy Time Series Scenarios From Deep Generative Models
An assessment of the currently used validation methods in the energy scenario generation literature shows that no single method sufficiently characterizes a scenario but ideally validation should include multiple methods and be interpreted carefully in the context of scenarios over short time periods.
Bilateral Denoising Diffusion Models
Novel bilateral denoising diffusion models (BDDMs) are proposed, which take significantly fewer steps to generate high-quality samples and are efficient, simple to train, and capable of further improving any pre-trained DDPM by optimizing the inference noise schedules.
BIGRoC: Boosting Image Generation via a Robust Classifier
This work proposes a general model-agnostic technique for improving the image quality and the distribution fidelity of generated images, obtained by any generative model, based on a post-processing procedure via the guidance of a given robust classifier and without a need for additional training of the generative models.


NVAE: A Deep Hierarchical Variational Autoencoder
NVAE is the first successful VAE applied to natural images as large as 256$\times$256 pixels and achieves state-of-the-art results among non-autoregressive likelihood-based models on the MNIST, CIFAR-10, CelebA 64, and CelebA HQ datasets and it provides a strong baseline on FFHQ.
Diagnosing and Enhancing VAE Models
This work rigorously analyzes the VAE objective, and uses the corresponding insights to develop a simple VAE enhancement that requires no additional hyperparameters or sensitive tuning, all while retaining desirable attributes of the original VAE architecture.
MAE: Mutual Posterior-Divergence Regularization for Variational AutoEncoders
Variational Autoencoder (VAE), a simple and effective deep generative model, has led to a number of impressive empirical successes and spawned many advanced variants and theoretical investigations.
Distribution Augmentation for Generative Modeling
DistAug is presented, a simple and powerful method of regularizing generative models that applies augmentation functions to data and conditions the generative model on the specific function used, enabling aggressive augmentations more commonly seen in supervised and self-supervised learning.
Optimizing the Latent Space of Generative Networks
Generative Latent Optimization (GLO), a framework to train deep convolutional generators using simple reconstruction losses, and enjoys many of the desirable properties of GANs: synthesizing visually-appealing samples, interpolating meaningfully between samples, and performing linear arithmetic with noise vectors; all of this without the adversarial optimization scheme.
Stacked Generative Adversarial Networks
A novel generative model named Stacked Generative Adversarial Networks (SGAN), which is trained to invert the hierarchical representations of a bottom-up discriminative network, which is able to generate images of much higher quality than GANs without stacking.
Implicit Generation and Generalization in Energy-Based Models
This work presents techniques to scale MCMC based EBM training on continuous neural networks, and shows its success on the high-dimensional data domains of ImageNet32x32, ImageNet128x128, CIFAR-10, and robotic hand trajectories, achieving better samples than other likelihood models and nearing the performance of contemporary GAN approaches.
Improved Training of Wasserstein GANs
This work proposes an alternative to clipping weights: penalize the norm of gradient of the critic with respect to its input, which performs better than standard WGAN and enables stable training of a wide variety of GAN architectures with almost no hyperparameter tuning.
Semi-Amortized Variational Autoencoders
This work proposes a hybrid approach, to use AVI to initialize the variational parameters and run stochastic variational inference (SVI) to refine them, which enables the use of rich generative models without experiencing the posterior-collapse phenomenon common in training VAEs for problems like text generation.
VAE with a VampPrior
This paper proposes to extend the variational auto-encoder (VAE) framework with a new type of prior called "Variational Mixture of Posteriors" prior, or VampPrior for short, which consists of a mixture distribution with components given by variational posteriors conditioned on learnable pseudo-inputs.