• Corpus ID: 213294407

Learning Deep-Latent Hierarchies by Stacking Wasserstein Autoencoders

@article{Gaujac2020LearningDH,
  title={Learning Deep-Latent Hierarchies by Stacking Wasserstein Autoencoders},
  author={Benoit Gaujac and Ilya Feige and David Barber},
  journal={ArXiv},
  year={2020},
  volume={abs/2010.03467}
}
Probabilistic models with hierarchical-latentvariable structures provide state-of-the-art results amongst non-autoregressive, unsupervised density-based models. However, the most common approach to training such models based on Variational Autoencoders (VAEs) often fails to leverage deep-latent hierarchies; successful approaches require complex inference and optimisation schemes. Optimal Transport is an alternative, non-likelihood-based framework for training generative models with appealing… 

Figures and Topics from this paper

References

SHOWING 1-10 OF 37 REFERENCES
Ladder Variational Autoencoders
TLDR
A new inference model is proposed, the Ladder Variational Autoencoder, that recursively corrects the generative distribution by a data dependent approximate likelihood in a process resembling the recently proposed Ladder Network.
BIVA: A Very Deep Hierarchy of Latent Variables for Generative Modeling
TLDR
This paper introduces the Bidirectional-Inference Variational Autoencoder (BIVA), characterized by a skip-connected generative model and an inference network formed by a bidirectional stochastic inference path, and shows that BIVA reaches state-of-the-art test likelihoods, generates sharp and coherent natural images, and uses the hierarchy of latent variables to capture different aspects of the data distribution.
Importance Weighted Autoencoders
TLDR
The importance weighted autoencoder (IWAE), a generative model with the same architecture as the VAE, but which uses a strictly tighter log-likelihood lower bound derived from importance weighting, shows empirically that IWAEs learn richer latent space representations than VAEs, leading to improved test log- likelihood on density estimation benchmarks.
An Architecture for Deep, Hierarchical Generative Models
We present an architecture which lets us train deep, directed generative models with many layers of latent variables. We include deterministic paths between all latent variables and the generated
Gaussian mixture models with Wasserstein distance
TLDR
This paper finds the discrete latent variable to be fully leveraged by the model when trained, without any modifications to the objective function or significant fine tuning.
Auxiliary Deep Generative Models
TLDR
This work extends deep generative models with auxiliary variables which improves the variational approximation and proposes a model with two stochastic layers and skip connections which shows state-of-the-art performance within semi-supervised learning on MNIST, SVHN and NORB datasets.
Learning Generative Models with Sinkhorn Divergences
TLDR
This paper presents the first tractable computational method to train large scale generative models using an optimal transport loss, and tackles three issues by relying on two key ideas: entropic smoothing, which turns the original OT loss into one that can be computed using Sinkhorn fixed point iterations; and algorithmic (automatic) differentiation of these iterations.
Learning Hierarchical Features from Deep Generative Models
TLDR
It is proved that hierarchical latent variable models do not take advantage of the hierarchical structure when trained with some existing variational methods, and some limitations on the kind of features existing models can learn are provided.
From optimal transport to generative modeling: the VEGAN cookbook
We study unsupervised generative modeling in terms of the optimal transport (OT) problem between true (but unknown) data distribution $P_X$ and the latent variable model distribution $P_G$. We show
Learning Hierarchical Priors in VAEs
TLDR
This work proposes to learn a hierarchical prior in the context of variational autoencoders to avoid the over-regularisation resulting from a standard normal prior distribution, and introduces a graph-based interpolation method, which shows that the topology of the learned latent representation corresponds to the topologies of the data manifold.
...
1
2
3
4
...