The Helmholtz Machine

@article{Dayan1995TheHM,
  title={The Helmholtz Machine},
  author={Peter Dayan and Geoffrey E. Hinton and Radford M. Neal and Richard S. Zemel},
  journal={Neural Computation},
  year={1995},
  volume={7},
  pages={889-904}
}
Discovering the structure inherent in a set of patterns is a fundamental aim of statistical inference or learning. One fruitful approach is to build a parameterized stochastic generative model, independent draws from which are likely to produce the patterns. For all but the simplest generative models, each pattern can be generated in exponentially many ways. It is thus intractable to adjust the parameters to maximize the probability of the observed patterns. We describe a way of finessing this… 
A Bayesian Unsupervised Learning Algorithm that Scales
TLDR
A novel hierarchical, generative model that can be viewed as a nonlinear generalization of factor analysis and can be implemented in a neural network performs perceptual inference in a probabilistically consistent manner by using top-down, bottom-up and lateral connections.
Recurrent Sampling Models for the Helmholtz Machine
  • P. Dayan
  • Computer Science
    Neural Computation
  • 1999
TLDR
This article suggests using either a Markov random field or an alternative stochastic sampling architecture to capture explicitly particular forms of dependence within each layer to capture correlations within layers in the generative or the recognition models.
Recurrent sampling models for the Helmholtz machine
TLDR
This article suggests using either a Markov random field or an alternative stochastic sampling architecture to capture explicitly particular forms of dependence within each layer to capture correlations within layers in the generative or the recognition models.
Bayesian Unsupervised Learning of Higher Order Structure
TLDR
This work presents an algorithm that efficiently discovers higher order structure using EM and Gibbs sampling and can be interpreted as a stochastic recurrent network in which ambiguity in lower-level states is resolved through feedback from higher levels.
Cascaded redundancy reduction.
TLDR
A method for incrementally constructing a hierarchical generative model of an ensemble of binary data vectors composed of stochastic, binary, logistic units with the goal of minimizing the information required to describe the data vectors using the model.
Cascaded redundancy reduction
TLDR
A method for incrementally constructing a hierarchical generative model of an ensemble of binary data vectors composed of stochastic, binary, logistic units with the goal of minimizing the information required to describe the data vectors using the model.
Using Helmholtz Machines to Analyze Multi-channel Neuronal Recordings
TLDR
This work presents an algorithm for automated discovery of stochastic firing patterns in large ensembles of neurons, from the "Helmholtz Machine" family, which attempts to predict the observed spike patterns in the data.
Mean eld approach to learning inBoltzmann
TLDR
A new approximate learning algorithm for Boltzmann Machines is presented, which is based on mean eld theory and the linear response theorem, and it is shown how the weights can be directly computed from the xed point equation of the learning rules.
Emergence of Modularity within One Sheet of Intrinsically Active Stochastic Neurons
TLDR
This work investigates how modular structure within a neu-ral net of stochastic neurons can emerge from intrinsic dynamics and from unsupervised learning of data and shows using a diierent data set that a parallel structure emerges which matches the data.
Training Bidirectional Helmholtz Machines
TLDR
This work presents a lower-bound for the likelihood of the generative model and shows that optimizing this bound regularizes the model so that the Bhattacharyya distance between the bottom-up and top-down approximate distributions is minimized.
...
...

References

SHOWING 1-10 OF 58 REFERENCES
A Bayesian Analysis of Self-Organizing Maps
TLDR
Bayesian methods are used to analyze some of the properties of a special type of Markov chain and derive the theory of self-supervision, in which the higher layers of a multilayer network supervise the lower layers, even though overall there is no external teacher.
A Learning Algorithm for Boltzmann Machines
Unsupervised Learning of Mixtures of Multiple Causes in Binary Data
TLDR
This paper presents a formulation for unsupervised learning of clusters reflecting multiple causal structure in binary data, and demonstrates the algorithm's ability to discover coherent multiple causal representations of noisy test data and in images of printed characters.
Dynamic Model of Visual Recognition Predicts Neural Response Properties in the Visual Cortex
TLDR
A hierarchical network model of visual recognition that explains experimental observations regarding neural responses in both free viewing and fixating conditions by using a form of the extended Kalman filter as given by the minimum description length (MDL) principle is described.
Self-organizing neural network that discovers surfaces in random-dot stereograms
TLDR
The authors' simulations show that when the learning procedure is applied to adjacent patches of two-dimensional images, it allows a neural network that has no prior knowledge of the third dimension to discover depth in random dot stereograms of curved surfaces.
A Multiple Cause Mixture Model for Unsupervised Learning
  • E. Saund
  • Computer Science
    Neural Computation
  • 1995
TLDR
A formulation for unsupervised learning of clusters reflecting multiple causal structure in binary data, which employs an objective function and iterative gradient descent learning algorithm resembling the conventional mixture model and demonstrates its ability to discover coherent multiple causal representations in several experimental data sets.
Connectionist Learning of Belief Networks
A massively parallel architecture for a self-organizing neural pattern recognition machine
Learning Population Codes by Minimizing Description Length
TLDR
It is shown how MDL can be used to develop highly redundant population codes, thus allowing flexibility, as the network develops a discontinuous topography when presented with different input classes.
Autoencoders, Minimum Description Length and Helmholtz Free Energy
TLDR
It is shown that the recognition weights of an autoencoder can be used to compute an approximation to the Boltzmann distribution and that this approximation gives an upper bound on the description length.
...
...