A Fast Learning Algorithm for Deep Belief Nets

  title={A Fast Learning Algorithm for Deep Belief Nets},
  author={Geoffrey E. Hinton and Simon Osindero and Yee Whye Teh},
  journal={Neural Computation},
We show how to use complementary priors to eliminate the explaining-away effects that make inference difficult in densely connected belief nets that have many hidden layers. Using complementary priors, we derive a fast, greedy algorithm that can learn deep, directed belief networks one layer at a time, provided the top two layers form an undirected associative memory. The fast, greedy algorithm is used to initialize a slower learning procedure that fine-tunes the weights using a contrastive… 

Sparse Deep Belief Net for Handwritten Digits Classification

Another version of Sparse Deep Belief Net is proposed which applies the differentiable sparse coding method to train the first level of the deep network, and then train the higher layers with RBM, which leads to state-of-the-art performance on the classification of handwritten digits.

Efficient Learning of Deep Boltzmann Machines

We present a new approximate inference algorithm for Deep Boltzmann Machines (DBM’s), a generative model with many layers of hidden variables. The algorithm learns a separate “recognition” model that

Exploring Strategies for Training Deep Neural Networks

These experiments confirm the hypothesis that the greedy layer-wise unsupervised training strategy helps the optimization by initializing weights in a region near a good local minimum, but also implicitly acts as a sort of regularization that brings better generalization and encourages internal distributed representations that are high-level abstractions of the input.

An Efficient Learning Procedure for Deep Boltzmann Machines

A new learning algorithm for Boltzmann machines that contain many layers of hidden variables is presented and results on the MNIST and NORB data sets are presented showing that deep BoltZmann machines learn very good generative models of handwritten digits and 3D objects.

On the quantitative analysis of deep belief networks

It is shown that Annealed Importance Sampling (AIS) can be used to efficiently estimate the partition function of an RBM, and a novel AIS scheme for comparing RBM's with different architectures is presented.

Unsupervised feature learning using Markov deep belief network

A new deep learning model, named Markov DBN (MDBN), is proposed to address problems of DBN, which employs a new way for DBN to reduce computational burden and handle large images.

Partitioning Large Scale Deep Belief Networks Using Dropout

This work considers a well-known machine learning model, deep belief networks (DBNs), and proposes an approach that can use the computing clusters in a distributed environment to train large models, while the dense matrix computations within a single machine are sped up using graphics processors (GPU).

Modular deep belief networks that do not forget

The M-DBN is introduced, an unsupervised modular DBN that addresses the forgetting problem and retains learned features even after those features are removed from the training data, while monolithic DBNs of comparable size forget feature mappings learned before.

Greedy Layer-Wise Training of Deep Networks

These experiments confirm the hypothesis that the greedy layer-wise unsupervised training strategy mostly helps the optimization, by initializing weights in a region near a good local minimum, giving rise to internal distributed representations that are high-level abstractions of the input, bringing better generalization.

Normal sparse Deep Belief Network

This paper proposes a new method namely nsDBN that has different behaviors according to deviation of the activation of the hidden units from a (low) fixed value and has a variance parameter that can control the force degree of sparseness.



Visual Recognition and Inference Using Dynamic Overcomplete Sparse Learning

It is shown that increasing the degree of overcompleteness improves recognition performance in difficult scenes with occluded objects in clutter.

Connectionist Learning of Belief Networks

Rate-coded Restricted Boltzmann Machines for Face Recognition

We describe a neurally-inspired, unsupervised learning algorithm that builds a non-linear generative model for pairs of face images from the same individual. Individuals are then recognized by

On Contrastive Divergence Learning

The properties of CD learning are studied and it is shown that it provides biased estimates in general, but that the bias is typically very small.

Knowledge Transfer in Deep convolutional Neural Nets

This paper demonstrates that components of a trained deep convolutional neural net can constructively transfer information to another such net, and shows a clear advantage in relying upon transferred knowledge to learn new tasks when given small training sets, if the new tasks are sufficiently similar to the previously mastered one.

Recognizing Hand-written Digits Using Hierarchical Products of Experts

On the MNIST database, the system is comparable with current state-of-the-art discriminative methods, demonstrating that the product of experts learning procedure can produce effective generative models of high-dimensional data.

Optimal unsupervised learning in a single-layer linear feedforward neural network

Boosting a weak learning algorithm by majority

An algorithm for improving the accuracy of algorithms for learning binary concepts by combining a large number of hypotheses, each of which is generated by training the given learning algorithm on a different set of examples, is presented.

Energy-Based Models for Sparse Overcomplete Representations

A new way of extending independent components analysis (ICA) to overcomplete representations that defines features as deterministic (linear) functions of the inputs and assigns energies to the features through the Boltzmann distribution.

Learning Sparse Topographic Representations with Products of Student-t Distributions

A model for natural images in which the probability of an image is proportional to the product of the probabilities of some filter outputs is proposed and used as a prior to derive the "iterated Wiener filter" for the purpose of denoising images.