An Efficient Learning Procedure for Deep Boltzmann Machines

@article{Salakhutdinov2012AnEL,
  title={An Efficient Learning Procedure for Deep Boltzmann Machines},
  author={Ruslan Salakhutdinov and Geoffrey E. Hinton},
  journal={Neural Computation},
  year={2012},
  volume={24},
  pages={1967-2006}
}
We present a new learning algorithm for Boltzmann machines that contain many layers of hidden variables. Data-dependent statistics are estimated using a variational approximation that tends to focus on a single mode, and data-independent statistics are estimated using persistent Markov chains. The use of two quite different techniques for estimating the two types of statistic that enter into the gradient of the log likelihood makes it practical to learn Boltzmann machines with multiple hidden… Expand
An Infinite Deep Boltzmann Machine
TLDR
Experimental results indicate that iDBM can learn a generative and discriminative model as good as the original DBM, and has successfully eliminated the requirement of model selection for hidden layer sizes of DBMs. Expand
A Two-Stage Pretraining Algorithm for Deep Boltzmann Machines
TLDR
This paper shows empirically that the proposed method overcomes the difficulty in training DBMs from randomly initialized parameters and results in a better, or comparable, generative model when compared to the conventional pretraining algorithm. Expand
Where Do Features Come From?
TLDR
Using a stack of RBMs to initialize the weights of a feedforward neural network allows backpropagation to work effectively in much deeper networks and it leads to much better generalization. Expand
On the Training Algorithms for Restricted Boltzmann Machines
  • L. A. Passos, J. Papa
  • Computer Science
  • Anais Estendidos da Conference on Graphics, Patterns and Images (SIBGRAPI)
  • 2019
TLDR
The validation of the model is presented in the context of image reconstruction and unsupervised feature learning, and its main contributions are: temperature parameter introduction in DBM formulation, DBM using adaptive temperature, and DBM meta-parameter optimization through meta-heuristic techniques. Expand
Deep Restricted Boltzmann Networks
TLDR
A new method to compose RBMs to form a multi-layer network style architecture and a training method that trains all layers jointly that can generate descent images and outperform the normal RBM significantly in terms of image quality and feature quality, without losing much efficiency for training. Expand
Notes on Boltzmann Machines
I. INTRODUCTION Boltzmann machines are probability distributions on high dimensional binary vectors which are analogous to Gaussian Markov Random Fields in that they are fully determined by first andExpand
On the propriety of restricted Boltzmann machines
TLDR
The relationship between RBM parameter specification in the binary case and the tendency to undesirable model properties such as degeneracy, instability and uninterpretability are discussed. Expand
How to Pretrain Deep Boltzmann Machines in Two Stages
TLDR
This paper shows empirically that the proposed method overcomes the difficulty in training DBMs from randomly initialized parameters and results in a better, or comparable, generative model when compared to the conventional pretraining algorithm. Expand
Relationship between PreTraining and Maximum Likelihood Estimation in Deep Boltzmann Machines
TLDR
A pretraining algorithm, which is a layer-bylayer greedy learning algorithm, for a deep Boltzmann machine (DBM) is presented and it can be ensured that the pretraining improves the variational bound of the true log-likelihood function of the DBM. Expand
Properties and Bayesian fitting of restricted Boltzmann machines
TLDR
The relationship between RBM parameter specification in the binary case and model properties such as degeneracy, instability and uninterpretability are discussed and the potential Bayes fitting of such (highly flexible) models are discussed. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 72 REFERENCES
Deep Boltzmann Machines
TLDR
A new learning algorithm for Boltzmann machines that contain many layers of hidden variables that is made more efficient by using a layer-by-layer “pre-training” phase that allows variational inference to be initialized with a single bottomup pass. Expand
Efficient Learning of Deep Boltzmann Machines
We present a new approximate inference algorithm for Deep Boltzmann Machines (DBM’s), a generative model with many layers of hidden variables. The algorithm learns a separate “recognition” model thatExpand
A Fast Learning Algorithm for Deep Belief Nets
TLDR
A fast, greedy algorithm is derived that can learn deep, directed belief networks one layer at a time, provided the top two layers form an undirected associative memory. Expand
On the quantitative analysis of deep belief networks
TLDR
It is shown that Annealed Importance Sampling (AIS) can be used to efficiently estimate the partition function of an RBM, and a novel AIS scheme for comparing RBM's with different architectures is presented. Expand
Implicit Mixtures of Restricted Boltzmann Machines
TLDR
Results for the MNIST and NORB datasets are presented showing that the implicit mixture of RBMs learns clusters that reflect the class structure in the data. Expand
Greedy Layer-Wise Training of Deep Networks
TLDR
These experiments confirm the hypothesis that the greedy layer-wise unsupervised training strategy mostly helps the optimization, by initializing weights in a region near a good local minimum, giving rise to internal distributed representations that are high-level abstractions of the input, bringing better generalization. Expand
Connectionist Learning of Belief Networks
  • R. Neal
  • Computer Science
  • Artif. Intell.
  • 1992
TLDR
The “Gibbs sampling” simulation procedure for “sigmoid” and “noisy-OR” varieties of probabilistic belief networks can support maximum-likelihood learning from empirical data through local gradient ascent. Expand
Phone Recognition with the Mean-Covariance Restricted Boltzmann Machine
TLDR
This work uses the mean-covariance restricted Boltzmann machine (mcRBM) to learn features of speech data that serve as input into a standard DBN, and achieves a phone error rate superior to all published results on speaker-independent TIMIT to date. Expand
Learning Deep Boltzmann Machines using Adaptive MCMC
TLDR
This paper first shows a close connection between Fast PCD and adaptive MCMC, and develops a Coupled Adaptive Simulated Tempering algorithm that can be used to better explore a highly multimodal energy landscape. Expand
Learning Deep Architectures for AI
TLDR
The motivations and principles regarding learning algorithms for deep architectures, in particular those exploiting as building blocks unsupervised learning of single-layer modelssuch as Restricted Boltzmann Machines, used to construct deeper models such as Deep Belief Networks are discussed. Expand
...
1
2
3
4
5
...