A Fast Learning Algorithm for Deep Belief Nets

@article{Hinton2006AFL,
  title={A Fast Learning Algorithm for Deep Belief Nets},
  author={Geoffrey E. Hinton and Simon Osindero and Yee Whye Teh},
  journal={Neural Computation},
  year={2006},
  volume={18},
  pages={1527-1554}
}
We show how to use complementary priors to eliminate the explaining-away effects that make inference difficult in densely connected belief nets that have many hidden layers. Using complementary priors, we derive a fast, greedy algorithm that can learn deep, directed belief networks one layer at a time, provided the top two layers form an undirected associative memory. The fast, greedy algorithm is used to initialize a slower learning procedure that fine-tunes the weights using a contrastive… Expand
Sparse Deep Belief Net for Handwritten Digits Classification
TLDR
Another version of Sparse Deep Belief Net is proposed which applies the differentiable sparse coding method to train the first level of the deep network, and then train the higher layers with RBM, which leads to state-of-the-art performance on the classification of handwritten digits. Expand
A Novel Sparse Deep Belief Network for Unsupervised Feature Learning
TLDR
A novel version of sparse deep belief network for unsupervised feature extraction that learns hierarchical representations which mimicks computations in the cortical hierarchy, and obtains more discriminative representation than PCA and several basic algorithms of deep belief networks. Expand
Efficient Learning of Deep Boltzmann Machines
We present a new approximate inference algorithm for Deep Boltzmann Machines (DBM’s), a generative model with many layers of hidden variables. The algorithm learns a separate “recognition” model thatExpand
Exploring Strategies for Training Deep Neural Networks
TLDR
These experiments confirm the hypothesis that the greedy layer-wise unsupervised training strategy helps the optimization by initializing weights in a region near a good local minimum, but also implicitly acts as a sort of regularization that brings better generalization and encourages internal distributed representations that are high-level abstractions of the input. Expand
An Efficient Learning Procedure for Deep Boltzmann Machines
TLDR
A new learning algorithm for Boltzmann machines that contain many layers of hidden variables is presented and results on the MNIST and NORB data sets are presented showing that deep BoltZmann machines learn very good generative models of handwritten digits and 3D objects. Expand
On the quantitative analysis of deep belief networks
TLDR
It is shown that Annealed Importance Sampling (AIS) can be used to efficiently estimate the partition function of an RBM, and a novel AIS scheme for comparing RBM's with different architectures is presented. Expand
Unsupervised feature learning using Markov deep belief network
TLDR
A new deep learning model, named Markov DBN (MDBN), is proposed to address problems of DBN, which employs a new way for DBN to reduce computational burden and handle large images. Expand
Partitioning Large Scale Deep Belief Networks Using Dropout
TLDR
This work considers a well-known machine learning model, deep belief networks (DBNs), and proposes an approach that can use the computing clusters in a distributed environment to train large models, while the dense matrix computations within a single machine are sped up using graphics processors (GPU). Expand
Modular deep belief networks that do not forget
TLDR
The M-DBN is introduced, an unsupervised modular DBN that addresses the forgetting problem and retains learned features even after those features are removed from the training data, while monolithic DBNs of comparable size forget feature mappings learned before. Expand
Greedy Layer-Wise Training of Deep Networks
TLDR
These experiments confirm the hypothesis that the greedy layer-wise unsupervised training strategy mostly helps the optimization, by initializing weights in a region near a good local minimum, giving rise to internal distributed representations that are high-level abstractions of the input, bringing better generalization. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 33 REFERENCES
Visual Recognition and Inference Using Dynamic Overcomplete Sparse Learning
TLDR
It is shown that increasing the degree of overcompleteness improves recognition performance in difficult scenes with occluded objects in clutter and takes advantage of the sparsity of each layer to update only a small subset of elements in a large weight matrix at each iteration. Expand
Connectionist Learning of Belief Networks
  • R. Neal
  • Computer Science
  • Artif. Intell.
  • 1992
TLDR
The “Gibbs sampling” simulation procedure for “sigmoid” and “noisy-OR” varieties of probabilistic belief networks can support maximum-likelihood learning from empirical data through local gradient ascent. Expand
Rate-coded Restricted Boltzmann Machines for Face Recognition
We describe a neurally-inspired, unsupervised learning algorithm that builds a non-linear generative model for pairs of face images from the same individual. Individuals are then recognized byExpand
On Contrastive Divergence Learning
TLDR
The properties of CD learning are studied and it is shown that it provides biased estimates in general, but that the bias is typically very small. Expand
Knowledge Transfer in Deep convolutional Neural Nets
TLDR
This paper demonstrates that components of a trained deep convolutional neural net can constructively transfer information to another such net, and shows a clear advantage in relying upon transferred knowledge to learn new tasks when given small training sets, if the new tasks are sufficiently similar to the previously mastered one. Expand
Recognizing Hand-written Digits Using Hierarchical Products of Experts
TLDR
On the MNIST database, the system is comparable with current state-of-the-art discriminative methods, demonstrating that the product of experts learning procedure can produce effective generative models of high-dimensional data. Expand
Optimal unsupervised learning in a single-layer linear feedforward neural network
TLDR
An optimality principle is proposed which is based upon preserving maximal information in the output units and an algorithm for unsupervised learning based upon a Hebbian learning rule, which achieves the desired optimality is presented. Expand
Boosting a weak learning algorithm by majority
TLDR
An algorithm for improving the accuracy of algorithms for learning binary concepts by combining a large number of hypotheses, each of which is generated by training the given learning algorithm on a different set of examples, is presented. Expand
Energy-Based Models for Sparse Overcomplete Representations
TLDR
A new way of extending independent components analysis (ICA) to overcomplete representations that defines features as deterministic (linear) functions of the inputs and assigns energies to the features through the Boltzmann distribution. Expand
Learning Sparse Topographic Representations with Products of Student-t Distributions
TLDR
A model for natural images in which the probability of an image is proportional to the product of the probabilities of some filter outputs is proposed and used as a prior to derive the "iterated Wiener filter" for the purpose of denoising images. Expand
...
1
2
3
4
...