# A Fast Learning Algorithm for Deep Belief Nets

@article{Hinton2006AFL, title={A Fast Learning Algorithm for Deep Belief Nets}, author={Geoffrey E. Hinton and Simon Osindero and Yee Whye Teh}, journal={Neural Computation}, year={2006}, volume={18}, pages={1527-1554} }

We show how to use complementary priors to eliminate the explaining-away effects that make inference difficult in densely connected belief nets that have many hidden layers. Using complementary priors, we derive a fast, greedy algorithm that can learn deep, directed belief networks one layer at a time, provided the top two layers form an undirected associative memory. The fast, greedy algorithm is used to initialize a slower learning procedure that fine-tunes the weights using a contrastive… Expand

#### Topics from this paper

#### 12,886 Citations

Sparse Deep Belief Net for Handwritten Digits Classification

- Computer Science
- AICI
- 2010

Another version of Sparse Deep Belief Net is proposed which applies the differentiable sparse coding method to train the first level of the deep network, and then train the higher layers with RBM, which leads to state-of-the-art performance on the classification of handwritten digits. Expand

A Novel Sparse Deep Belief Network for Unsupervised Feature Learning

- Computer Science
- 2012

A novel version of sparse deep belief network for unsupervised feature extraction that learns hierarchical representations which mimicks computations in the cortical hierarchy, and obtains more discriminative representation than PCA and several basic algorithms of deep belief networks. Expand

Efficient Learning of Deep Boltzmann Machines

- Mathematics, Computer Science
- AISTATS
- 2010

We present a new approximate inference algorithm for Deep Boltzmann Machines (DBM’s), a generative model with many layers of hidden variables. The algorithm learns a separate “recognition” model that… Expand

Exploring Strategies for Training Deep Neural Networks

- Computer Science
- J. Mach. Learn. Res.
- 2009

These experiments confirm the hypothesis that the greedy layer-wise unsupervised training strategy helps the optimization by initializing weights in a region near a good local minimum, but also implicitly acts as a sort of regularization that brings better generalization and encourages internal distributed representations that are high-level abstractions of the input. Expand

An Efficient Learning Procedure for Deep Boltzmann Machines

- Medicine, Computer Science
- Neural Computation
- 2012

A new learning algorithm for Boltzmann machines that contain many layers of hidden variables is presented and results on the MNIST and NORB data sets are presented showing that deep BoltZmann machines learn very good generative models of handwritten digits and 3D objects. Expand

On the quantitative analysis of deep belief networks

- Mathematics, Computer Science
- ICML '08
- 2008

It is shown that Annealed Importance Sampling (AIS) can be used to efficiently estimate the partition function of an RBM, and a novel AIS scheme for comparing RBM's with different architectures is presented. Expand

Unsupervised feature learning using Markov deep belief network

- Computer Science
- 2013 IEEE International Conference on Image Processing
- 2013

A new deep learning model, named Markov DBN (MDBN), is proposed to address problems of DBN, which employs a new way for DBN to reduce computational burden and handle large images. Expand

Partitioning Large Scale Deep Belief Networks Using Dropout

- Computer Science, Mathematics
- ArXiv
- 2015

This work considers a well-known machine learning model, deep belief networks (DBNs), and proposes an approach that can use the computing clusters in a distributed environment to train large models, while the dense matrix computations within a single machine are sped up using graphics processors (GPU). Expand

Modular deep belief networks that do not forget

- Computer Science
- The 2011 International Joint Conference on Neural Networks
- 2011

The M-DBN is introduced, an unsupervised modular DBN that addresses the forgetting problem and retains learned features even after those features are removed from the training data, while monolithic DBNs of comparable size forget feature mappings learned before. Expand

Greedy Layer-Wise Training of Deep Networks

- Computer Science
- NIPS
- 2006

These experiments confirm the hypothesis that the greedy layer-wise unsupervised training strategy mostly helps the optimization, by initializing weights in a region near a good local minimum, giving rise to internal distributed representations that are high-level abstractions of the input, bringing better generalization. Expand

#### References

SHOWING 1-10 OF 33 REFERENCES

Visual Recognition and Inference Using Dynamic Overcomplete Sparse Learning

- Computer Science, Medicine
- Neural Computation
- 2007

It is shown that increasing the degree of overcompleteness improves recognition performance in difficult scenes with occluded objects in clutter and takes advantage of the sparsity of each layer to update only a small subset of elements in a large weight matrix at each iteration. Expand

Connectionist Learning of Belief Networks

- Computer Science
- Artif. Intell.
- 1992

The “Gibbs sampling” simulation procedure for “sigmoid” and “noisy-OR” varieties of probabilistic belief networks can support maximum-likelihood learning from empirical data through local gradient ascent. Expand

Rate-coded Restricted Boltzmann Machines for Face Recognition

- Computer Science
- NIPS
- 2000

We describe a neurally-inspired, unsupervised learning algorithm that builds a non-linear generative model for pairs of face images from the same individual. Individuals are then recognized by… Expand

On Contrastive Divergence Learning

- Computer Science
- AISTATS
- 2005

The properties of CD learning are studied and it is shown that it provides biased estimates in general, but that the bias is typically very small. Expand

Knowledge Transfer in Deep convolutional Neural Nets

- Computer Science
- Int. J. Artif. Intell. Tools
- 2007

This paper demonstrates that components of a trained deep convolutional neural net can constructively transfer information to another such net, and shows a clear advantage in relying upon transferred knowledge to learn new tasks when given small training sets, if the new tasks are sufficiently similar to the previously mastered one. Expand

Recognizing Hand-written Digits Using Hierarchical Products of Experts

- Computer Science
- NIPS
- 2000

On the MNIST database, the system is comparable with current state-of-the-art discriminative methods, demonstrating that the product of experts learning procedure can produce effective generative models of high-dimensional data. Expand

Optimal unsupervised learning in a single-layer linear feedforward neural network

- Computer Science
- Neural Networks
- 1989

An optimality principle is proposed which is based upon preserving maximal information in the output units and an algorithm for unsupervised learning based upon a Hebbian learning rule, which achieves the desired optimality is presented. Expand

Boosting a weak learning algorithm by majority

- Computer Science
- COLT '90
- 1990

An algorithm for improving the accuracy of algorithms for learning binary concepts by combining a large number of hypotheses, each of which is generated by training the given learning algorithm on a different set of examples, is presented. Expand

Energy-Based Models for Sparse Overcomplete Representations

- Computer Science, Mathematics
- J. Mach. Learn. Res.
- 2003

A new way of extending independent components analysis (ICA) to overcomplete representations that defines features as deterministic (linear) functions of the inputs and assigns energies to the features through the Boltzmann distribution. Expand

Learning Sparse Topographic Representations with Products of Student-t Distributions

- Computer Science
- NIPS
- 2002

A model for natural images in which the probability of an image is proportional to the product of the probabilities of some filter outputs is proposed and used as a prior to derive the "iterated Wiener filter" for the purpose of denoising images. Expand