• Corpus ID: 1334653

Better Mixing via Deep Representations

  title={Better Mixing via Deep Representations},
  author={Yoshua Bengio and Gr{\'e}goire Mesnil and Yann Dauphin and Salah Rifai},
  booktitle={International Conference on Machine Learning},
It has been hypothesized, and supported with experimental evidence, that deeper representations, when well trained, tend to do a better job at disentangling the underlying factors of variation. We study the following related conjecture: better representations, in the sense of better disentangling, can be exploited to produce Markov chains that mix faster between modes. Consequently, mixing between modes would be more efficient at higher levels of representation. To better understand this, we… 

Figures and Tables from this paper

Why Deep Learning Works: A Manifold Disentanglement Perspective

This paper provides quantitative evidence to validate the flattening hypothesis and proposes a few quantities for measuring manifold entanglement under certain assumptions and conducts experiments with both synthetic and real-world data, which validate the proposition and lead to new insights on deep learning.

Deep Learning of Representations: Looking Forward

This paper proposes to examine some of the challenges of scaling deep learning algorithms to much larger models and datasets, reducing optimization difficulties due to ill-conditioning or local minima, designing more efficient and powerful inference and sampling procedures, and learning to disentangle the factors of variation underlying the observed data.

Understanding and Deep Learning Representation

It is argued that coupling analyses from philosophical accounts with the empirical and theoretical basis for identifying these factors in deep learning representations provides a framework for discussing and critically evaluating potential machine understanding given the continually improving task performance enabled by such algorithms.

Encouraging Disentangled and Convex Representation with Controllable Interpolation Regularization

This work proposes a simple yet efficient method: Controllable Interpolation Regularization (CIR), which creates a positive loop where disentanglement and convexity can help each other and improves downstream tasks: controllable image synthesis, cross-modality image translation and zero-shot synthesis.

Concept Formation and Dynamics of Repeated Inference in Deep Generative Models

This study demonstrated that transient dynamics of inference first approaches a concept, and then moves close to a memory, and revealed that the inference dynamics approaches a more abstract concept to the extent that the uncertainty of input data increases due to noise.

Deep Learning of Representations

This chapter reviews the main motivations and ideas behind deep learning algorithms and their representation-learning components, as well as recent results, and proposes a vision of challenges and hopes on the road ahead, focusing on the questions of invariance and disentangling.

Representation Learning: A Review and New Perspectives

Recent work in the area of unsupervised feature learning and deep learning is reviewed, covering advances in probabilistic models, autoencoders, manifold learning, and deep networks.

Collective dynamics of repeated inference in variational autoencoder rapidly find cluster structure

This study numerically analyzed how the activity pattern of trained networks changes in the latent space during inference and revealed that when a cluster structure exists in the dataset, the trajectory rapidly approaches the center of the cluster.

Channel-Recurrent Variational Autoencoders

This paper proposes to integrate recurrent connections across channels to both inference and generation steps of VAE, and shows that the channel-recurrent VAE improves existing approaches in multiple aspects.

Attentive Conditional Channel-Recurrent Autoencoding for Attribute-Conditioned Face Synthesis

This work augments the pathways connecting the latent space with channel-recurrent architecture in order to provide not only improved generation qualities but also interpretable high-level features in attribute-conditioned face synthesis.



A Generative Process for Contractive Auto-Encoders

A procedure for generating samples that are consistent with the local structure captured by a contractive auto-encoder and which experimentally appears to converge quickly and mix well between modes, compared to Restricted Boltzmann Machines and Deep Belief Networks is proposed.

A Generative Process for sampling Contractive Auto-Encoders

A procedure for generating samples that are consistent with the local structure captured by a contractive auto-encoder and which experimentally appears to converge quickly and mix well between modes, compared to Restricted Boltzmann Machines and Deep Belief Networks is proposed.

Tempered Markov Chain Monte Carlo for training of Restricted Boltzmann Machines

This work explores the use of tempered Markov Chain Monte-Carlo for sampling in RBMs and finds both through visualization of samples and measures of likelihood that it helps both sampling and learning.

Learning Deep Architectures for AI

The motivations and principles regarding learning algorithms for deep architectures, in particular those exploiting as building blocks unsupervised learning of single-layer modelssuch as Restricted Boltzmann Machines, used to construct deeper models such as Deep Belief Networks are discussed.

Scaling learning algorithms towards AI

It is argued that deep architectures have the potential to generalize in non-local ways, i.e., beyond immediate neighbors, and that this is crucial in order to make progress on the kind of complex tasks required for artificial intelligence.

Learning Deep Boltzmann Machines using Adaptive MCMC

This paper first shows a close connection between Fast PCD and adaptive MCMC, and develops a Coupled Adaptive Simulated Tempering algorithm that can be used to better explore a highly multimodal energy landscape.

Regularized Auto-Encoders Estimate Local Statistics

It is shown that minimizing a particular form of regularized reconstruction error yields a reconstruction function that locally characterizes the shape of the data generating density.

A Fast Learning Algorithm for Deep Belief Nets

A fast, greedy algorithm is derived that can learn deep, directed belief networks one layer at a time, provided the top two layers form an undirected associative memory.

The Manifold Tangent Classifier

A representation learning algorithm can be stacked to yield a deep architecture and it is shown how it builds a topological atlas of charts, each chart being characterized by the principal singular vectors of the Jacobian of a representation mapping.

Learning Many Related Tasks at the Same Time with Backpropagation

This work shows that a backprop net learning many related tasks at the same time can use these tasks as inductive bias for each other and thus learn better and give empirical evidence that multitask backprop generalizes better in real domains.