Generalization Gap in Amortized Inference

  title={Generalization Gap in Amortized Inference},
  author={Mingtian Zhang and Peter Hayes and David Barber},
The ability of likelihood-based probabilistic models to generalize to unseen data is central to many machine learning applications such as lossless compression. In this work, we study the generalizations of a popular class of probabilistic models - the Variational Auto-Encoder (VAE). We point out the two generalization gaps that can affect the generalization ability of VAEs and show that the over-fitting phenomenon is usually dominated by the amortized inference network. Based on this… 

Figures from this paper

Improving VAE-based Representation Learning
It is shown that by using a decoder that prefers to learn local features, the remaining global features can be well captured by the latent, which significantly improves performance of a downstream classi-cation task.
Laplacian Autoencoders for Learning Stochastic Representations
A Bayesian autoencoder for unsupervised representation learning is presented, which is trained using a novel variational lower-bound of the autoen coder evidence that takes the shape of a Laplace approximation and results in improved performance across a multitude of downstream tasks.


InfoVAE: Information Maximizing Variational Autoencoders
It is shown that this model can significantly improve the quality of the variational posterior and can make effective use of the latent features regardless of the flexibility of the decoding distribution, and it is demonstrated that the models outperform competing approaches on multiple performance metrics.
Auto-Encoding Variational Bayes
A stochastic variational inference and learning algorithm that scales to large datasets and, under some mild differentiability conditions, even works in the intractable case is introduced.
On the Out-of-distribution Generalization of Probabilistic Image Modelling
This work proposes a Local Autoregressive model that exclusively models local image features towards improving OOD performance and employs the model to build a new lossless image compressor: NeLLoC (Neural Local Lossless Compressor) and report state-of-the-art compression rates and model size.
The Autoencoding Variational Autoencoder
It is shown that a (nominally trained) VAE does not necessarily amortize inference for typical samples that it is capable of generating, and encoders trained with the self-consistency approach lead to representations that are robust to perturbations in the input introduced by adversarial attacks.
Bias and Generalization in Deep Generative Models: An Empirical Study
A framework to systematically investigate bias and generalization in deep generative models of images is proposed and inspired by experimental methods from cognitive psychology to characterize when and how existing models generate novel attributes and their combinations.
Diagnosing and Enhancing VAE Models
This work rigorously analyzes the VAE objective, and uses the corresponding insights to develop a simple VAE enhancement that requires no additional hyperparameters or sensitive tuning, all while retaining desirable attributes of the original VAE architecture.
Auxiliary Deep Generative Models
This work extends deep generative models with auxiliary variables which improves the variational approximation and proposes a model with two stochastic layers and skip connections which shows state-of-the-art performance within semi-supervised learning on MNIST, SVHN and NORB datasets.
Practical Lossless Compression with Latent Variables using Bits Back Coding
Bits Back with ANS (BB-ANS) is presented, a scheme to perform lossless compression with latent variable models at a near optimal rate and it is concluded that with a sufficiently high quality generative model this scheme could be used to achieve substantial improvements in compression rate with acceptable running time.
A note on the evaluation of generative models
This article reviews mostly known but often underappreciated properties relating to the evaluation and interpretation of generative models with a focus on image models and shows that three of the currently most commonly used criteria---average log-likelihood, Parzen window estimates, and visual fidelity of samples---are largely independent of each other when the data is high-dimensional.
Neural Discrete Representation Learning
Pairing these representations with an autoregressive prior, the model can generate high quality images, videos, and speech as well as doing high quality speaker conversion and unsupervised learning of phonemes, providing further evidence of the utility of the learnt representations.