• Corpus ID: 46798026

beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework

  title={beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework},
  author={Irina Higgins and Lo{\"i}c Matthey and Arka Pal and Christopher P. Burgess and Xavier Glorot and Matthew M. Botvinick and Shakir Mohamed and Alexander Lerchner},
Learning an interpretable factorised representation of the independent data generative factors of the world without supervision is an important precursor for the development of artificial intelligence that is able to learn and reason in the same way that humans do. [] Key Method We introduce an adjustable hyperparameter beta that balances latent channel capacity and independence constraints with reconstruction accuracy. We demonstrate that beta-VAE with appropriately tuned beta > 1 qualitatively outperforms…

Figures and Tables from this paper

Analyzing disentanglement of visual objects in semi-supervised neural networks
This work trained and analyzed a popular DNN model of disentanglement, the β-variational autoencoder (β-VAE) on a new dataset, containing a “foreground” white circle and “background” isotropic Gaussian, and proposed that further inductive bias is needed to achieve better disentangling, such as a representation which factorizes static properties and their dynamics.
  • Computer Science
  • 2018
This paper addresses the task of disentanglement and introduces a new state-ofthe-art approach called Non-synergistic variational Autoencoder (Non-Syn VAE), where the notion of synergy arises when the encoded information by neurons in the form of responses from the stimuli is described.
Isolating Sources of Disentanglement in VAEs
A decomposition of the variational lower bound is shown that can be used to explain the success of the β-VAE in learning disentangled representations, and a new information-theoretic disentanglement metric is proposed, which is classifier-free and generalizable to arbitrarily-distributed and non-scalar latent variables.
Semi-Supervised Disentanglement of Class-Related and Class-Independent Factors in VAE
This work proposes a framework capable of disentangling class-related and class-independent factors of variation in data and employs an attention mechanism in its latent space in order to improve the process of extracting class- related factors from data.
Disentangling in Variational Autoencoders with Natural Clustering
N-VAE is presented, a model which is capable of separating factors of variation which are exclusive to certain classes from factors that are shared among classes and implements an explicitly compositional latent variable structure by defining a class-conditioned latent space and a shared latent space.
Hyperprior Induced Unsupervised Disentanglement of Latent Representations
It is argued that statistical independence in the latent space of VAEs can be enforced in a principled hierarchical Bayesian manner to augment the standard VAE with an inverse-Wishart (IW) prior on the covariance matrix of the latent code.
Visual Representation Learning Does Not Generalize Strongly Within the Same Domain
This paper test whether 17 unsupervised, weakly supervised, and fully supervised representation learning approaches correctly infer the generative factors of variation in simple datasets and observe that all of them struggle to learn the underlying mechanism regardless of supervision signal and architectural bias.
Independent Subspace Analysis for Unsupervised Learning of Disentangled Representations
It is demonstrated that the proposed prior significantly mitigates the trade-off between reconstruction loss and disentanglement over the state of the art and resolves the problem of unidentifiability of the standard VAE normal prior.
Reference-based Variational Autoencoders
The ability of the proposed reference-based variational autoencoders, a novel deep generative model designed to exploit the weak-supervision provided by the reference set, to learn disentangled representations from this minimal form of supervision is validated.
WeLa-VAE: Learning Alternative Disentangled Representations Using Weak Labels
This paper considers weak supervision by means of high-level labels that are not assumed to be explicitly related to the ground truth factors, which involves the maximization of a modified variational lower bound and total correlation regularization in WeLa-VAE, a variational inference framework where observations and labels share the same latent variables.


Learning to Disentangle Factors of Variation with Manifold Interaction
A higher-order Boltzmann machine that incorporates multiplicative interactions among groups of hidden units that each learn to encode a distinct factor of variation is proposed and achieves state-of-the-art emotion recognition and face verification performance on the Toronto Face Database.
Learning to Linearize Under Uncertainty
This work suggests a new architecture and loss for training deep feature hierarchies that linearize the transformations observed in unlabeled natural video sequences by training a generative model to predict video frames.
Weakly-supervised Disentangling with Recurrent Transformations for 3D View Synthesis
A novel recurrent convolutional encoder-decoder network that is trained end-to-end on the task of rendering rotated objects starting from a single image and allows the model to capture long-term dependencies along a sequence of transformations.
Disentangling Factors of Variation via Generative Entangling
This work proposes a novel model family based on the spike-and-slab restricted Boltzmann machine which is generalize to include higher-order interactions among multiple latent variables and applies it to the task of facial expression classification.
Representation Learning: A Review and New Perspectives
Recent work in the area of unsupervised feature learning and deep learning is reviewed, covering advances in probabilistic models, autoencoders, manifold learning, and deep networks.
Discovering Hidden Factors of Variation in Deep Networks
A cross-covariance penalty (XCov) is introduced as a method to disentangle factors like handwriting style for digits and subject identity in faces by augmenting autoencoders with simple regularization terms during training to demonstrate that standard deep architectures can discover and represent factors of variation beyond those relevant for categorization.
Generative Adversarial Nets
We propose a new framework for estimating generative models via an adversarial process, in which we simultaneously train two models: a generative model G that captures the data distribution, and a
Tensor Analyzers
An efficient way of sampling from the posterior distribution over factor values is described and it is demonstrated that these samples can be used in the EM algorithm for learning interesting mixture models of natural image patches.
High-Dimensional Probability Estimation with Deep Density Models
The deep density model (DDM) is introduced, a new approach to density estimation that exploits insights from deep learning to construct a bijective map to a representation space, under which the transformation of the distribution of the data is approximately factorized and has identical and known marginal densities.
Adam: A Method for Stochastic Optimization
This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.