• Corpus ID: 15539264

Rectified Linear Units Improve Restricted Boltzmann Machines

  title={Rectified Linear Units Improve Restricted Boltzmann Machines},
  author={Vinod Nair and Geoffrey E. Hinton},
Restricted Boltzmann machines were developed using binary stochastic hidden units. These can be generalized by replacing each binary unit by an infinite number of copies that all have the same weights but have progressively more negative biases. The learning and inference rules for these "Stepped Sigmoid Units" are unchanged. They can be approximated efficiently by noisy, rectified linear units. Compared with binary units, these units learn features that are better for object recognition on the… 

Figures and Tables from this paper

Restricted Boltzmann Machine with Adaptive Local Hidden Units
Experiments on hand-written digits and human faces show that the proposed variant of RBM with adaptive local hidden units ALRBM has the ability to learn region-based local feature representations adapting to the content of the images automatically.
On rectified linear units for speech processing
This work shows that it can improve generalization and make training of deep networks faster and simpler by substituting the logistic units with rectified linear units.
Restricted Boltzmann Machines With Gaussian Visible Units Guided by Pairwise Constraints
This paper proposes pairwise constraints (PCs) RBM with Gaussian visible units (pcGRBM) model, in which the learning procedure is guided by PCs and the process of encoding is conducted under these guidances, to enhance the expression ability of traditional RBMs.
Sparse hidden units activation in Restricted Boltzmann Machine
A new regularization term for sparse hidden units activation in the context of Restricted Boltzmann Machine (RBM) is studied, based on the symmetric Kullback-Leibler divergence applied to compare the actual and the desired distribution over the active hidden units.
A Spike and Slab Restricted Boltzmann Machine
We introduce the spike and slab Restricted Boltzmann Machine, characterized by having both a real-valued vector, the slab, and a binary variable, the spike, associated with each unit in the hidden
An Efficient Learning Procedure for Deep Boltzmann Machines
A new learning algorithm for Boltzmann machines that contain many layers of hidden variables is presented and results on the MNIST and NORB data sets are presented showing that deep BoltZmann machines learn very good generative models of handwritten digits and 3D objects.
Reducing Parameter Space for Neural Network Training
On better training the infinite restricted Boltzmann machines
Experimental results indicate that the proposed training strategy can greatly accelerate learning and enhance generalization ability of iRBMs.
Phone recognition with deep sparse rectifier neural networks
  • L. Tóth
  • Computer Science
    2013 IEEE International Conference on Acoustics, Speech and Signal Processing
  • 2013
It is shown that a deep architecture of rectifier neurons can attain the same recognition accuracy as deep neural networks, but without the need for pre-training.


Rate-coded Restricted Boltzmann Machines for Face Recognition
We describe a neurally-inspired, unsupervised learning algorithm that builds a non-linear generative model for pairs of face images from the same individual. Individuals are then recognized by
Implicit Mixtures of Restricted Boltzmann Machines
Results for the MNIST and NORB datasets are presented showing that the implicit mixture of RBMs learns clusters that reflect the class structure in the data.
A Fast Learning Algorithm for Deep Belief Nets
A fast, greedy algorithm is derived that can learn deep, directed belief networks one layer at a time, provided the top two layers form an undirected associative memory.
A Hierarchical Community of Experts
It is shown that Gibbs sampling can be used to learn the parameters of the linear and binary units even when the sampling is so brief that the Markov chain is far from equilibrium.
Phone recognition using Restricted Boltzmann Machines
Conditional Restricted Boltzmann Machines (CRBMs) have recently proved to be very effective for modeling motion capture sequences and this paper investigates the application of this more powerful type of generative model to acoustic modeling.
Unsupervised Learning of Distributions of Binary Vectors Using 2-Layer Networks
It is shown that arbitrary distributions of binary vectors can be approximated by the combination model and shown how the weight vectors in the model can be interpreted as high order correlation patterns among the input bits, and how the combination machine can be used as a mechanism for detecting these patterns.
Diffusion Networks, Products of Experts, and Factor Analysis
It is shown that when the unit activation functions are linear, this PoE architecture is equivalent to a factor analyzer, which suggests novel non-linear generalizations of factor analysis and independent component analysis that could be implemented using interactive neural circuitry.
What is the best multi-stage architecture for object recognition?
It is shown that using non-linearities that include rectification and local contrast normalization is the single most important ingredient for good accuracy on object recognition benchmarks and that two stages of feature extraction yield better accuracy than one.
Reducing the Dimensionality of Data with Neural Networks
This work describes an effective way of initializing the weights that allows deep autoencoder networks to learn low-dimensional codes that work much better than principal components analysis as a tool to reduce the dimensionality of data.
Learning methods for generic object recognition with invariance to pose and lighting
  • Yann LeCun, F. Huang, L. Bottou
  • Computer Science
    Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004.
  • 2004
A real-time version of the system was implemented that can detect and classify objects in natural scenes at around 10 frames per second and proved impractical, while convolutional nets yielded 16/7% error.