• Corpus ID: 877639

Deep Boltzmann Machines

@inproceedings{Salakhutdinov2009DeepBM,
  title={Deep Boltzmann Machines},
  author={Ruslan Salakhutdinov and Geoffrey E. Hinton},
  booktitle={International Conference on Artificial Intelligence and Statistics},
  year={2009}
}
We present a new learning algorithm for Boltzmann machines that contain many layers of hidden variables. Data-dependent expectations are estimated using a variational approximation that tends to focus on a single mode, and dataindependent expectations are approximated using persistent Markov chains. The use of two quite different techniques for estimating the two types of expectation that enter into the gradient of the log-likelihood makes it practical to learn Boltzmann machines with multiple… 

Figures and Tables from this paper

An Efficient Learning Procedure for Deep Boltzmann Machines

A new learning algorithm for Boltzmann machines that contain many layers of hidden variables is presented and results on the MNIST and NORB data sets are presented showing that deep BoltZmann machines learn very good generative models of handwritten digits and 3D objects.

Efficient Learning of Deep Boltzmann Machines

We present a new approximate inference algorithm for Deep Boltzmann Machines (DBM’s), a generative model with many layers of hidden variables. The algorithm learns a separate “recognition” model that

A Two-Stage Pretraining Algorithm for Deep Boltzmann Machines

This paper shows empirically that the proposed method overcomes the difficulty in training DBMs from randomly initialized parameters and results in a better, or comparable, generative model when compared to the conventional pretraining algorithm.

Joint Training Deep Boltzmann Machines for Classification

This work introduces a new method for training deep Boltzmann machines jointly, and shows that this approach performs competitively for classification and outperforms previous methods in terms of accuracy of approximate inference and classification with missing inputs.

How to Pretrain Deep Boltzmann Machines in Two Stages

This paper shows empirically that the proposed method overcomes the difficulty in training DBMs from randomly initialized parameters and results in a better, or comparable, generative model when compared to the conventional pretraining algorithm.

Variational EM Learning of DSBNs with Conditional Deep Boltzmann Machines

This paper describes a variational EM learning method of DSBNs with a new inference model known as the conditional deep Boltzmann machine (cDBM), which is an undirected graphical model capable of representing complex dependencies among latent variables.

Gaussian-Bernoulli deep Boltzmann machine

Improvements of the learning algorithm for GDBM help avoid some of the common difficulties found in training deep Boltzmann machines such as divergence of learning, the difficulty in choosing right learning rate scheduling, and the existence of meaningless higher layers.

An Infinite Deep Boltzmann Machine

Experimental results indicate that iDBM can learn a generative and discriminative model as good as the original DBM, and has successfully eliminated the requirement of model selection for hidden layer sizes of DBMs.

An Introduction to Restricted Boltzmann Machines

This tutorial introduces RBMs as undirected graphical models as building blocks of multi-layer learning systems called deep belief networks based on Markov chain Monte Carlo methods.

Enhanced Gradient and Adaptive Learning Rate for Training Restricted Boltzmann Machines

This work presents an enhanced gradient which is derived such that it is invariant to bit-flipping transformations and proposes a way to automatically adjust the learning rate by maximizing a local likelihood estimate.
...

References

SHOWING 1-10 OF 22 REFERENCES

Implicit Mixtures of Restricted Boltzmann Machines

Results for the MNIST and NORB datasets are presented showing that the implicit mixture of RBMs learns clusters that reflect the class structure in the data.

Learning and Evaluating Boltzmann Machines

An annealed importance sampling (AIS) procedure for estimating partition functions of rest ricted Boltzmann machines (RBM’s), semi-restricted Boltz Mann machines (SRBM”s), an d Boltzman machines (BM) is developed and empirical results indicate that the AIS procedu re provides much better estimates of the partition function than some of the popular variational-based methods.

A Fast Learning Algorithm for Deep Belief Nets

A fast, greedy algorithm is derived that can learn deep, directed belief networks one layer at a time, provided the top two layers form an undirected associative memory.

Connectionist Learning of Belief Networks

On the quantitative analysis of deep belief networks

It is shown that Annealed Importance Sampling (AIS) can be used to efficiently estimate the partition function of an RBM, and a novel AIS scheme for comparing RBM's with different architectures is presented.

A New Learning Algorithm for Mean Field Boltzmann Machines

A new learning algorithm for Mean Field Boltzmann Machines based on the contrastive divergence optimization criterion that eliminates the need to estimate equilibrium statistics, so it does not need to approximate the multimodal probability distribution of the free network with the unimodal mean field distribution.

Training restricted Boltzmann machines using approximations to the likelihood gradient

A new algorithm for training Restricted Boltzmann Machines is introduced, which is compared to some standard Contrastive Divergence and Pseudo-Likelihood algorithms on the tasks of modeling and classifying various types of data.

Scaling learning algorithms towards AI

It is argued that deep architectures have the potential to generalize in non-local ways, i.e., beyond immediate neighbors, and that this is crucial in order to make progress on the kind of complex tasks required for artificial intelligence.

Reducing the Dimensionality of Data with Neural Networks

This work describes an effective way of initializing the weights that allows deep autoencoder networks to learn low-dimensional codes that work much better than principal components analysis as a tool to reduce the dimensionality of data.

Training Products of Experts by Minimizing Contrastive Divergence

A product of experts (PoE) is an interesting candidate for a perceptual system in which rapid inference is vital and generation is unnecessary because it is hard even to approximate the derivatives of the renormalization term in the combination rule.