# The Helmholtz Machine

@article{Dayan1995TheHM, title={The Helmholtz Machine}, author={Peter Dayan and Geoffrey E. Hinton and Radford M. Neal and Richard S. Zemel}, journal={Neural Computation}, year={1995}, volume={7}, pages={889-904} }

Discovering the structure inherent in a set of patterns is a fundamental aim of statistical inference or learning. One fruitful approach is to build a parameterized stochastic generative model, independent draws from which are likely to produce the patterns. For all but the simplest generative models, each pattern can be generated in exponentially many ways. It is thus intractable to adjust the parameters to maximize the probability of the observed patterns. We describe a way of finessing this…

## 1,180 Citations

A Bayesian Unsupervised Learning Algorithm that Scales

- Computer Science
- 2007

A novel hierarchical, generative model that can be viewed as a nonlinear generalization of factor analysis and can be implemented in a neural network performs perceptual inference in a probabilistically consistent manner by using top-down, bottom-up and lateral connections.

Recurrent Sampling Models for the Helmholtz Machine

- Computer ScienceNeural Computation
- 1999

This article suggests using either a Markov random field or an alternative stochastic sampling architecture to capture explicitly particular forms of dependence within each layer to capture correlations within layers in the generative or the recognition models.

Recurrent sampling models for the Helmholtz machine

- Computer Science
- 1999

This article suggests using either a Markov random field or an alternative stochastic sampling architecture to capture explicitly particular forms of dependence within each layer to capture correlations within layers in the generative or the recognition models.

Bayesian Unsupervised Learning of Higher Order Structure

- Computer ScienceNIPS
- 1996

This work presents an algorithm that efficiently discovers higher order structure using EM and Gibbs sampling and can be interpreted as a stochastic recurrent network in which ambiguity in lower-level states is resolved through feedback from higher levels.

Cascaded redundancy reduction.

- Computer ScienceNetwork
- 1998

A method for incrementally constructing a hierarchical generative model of an ensemble of binary data vectors composed of stochastic, binary, logistic units with the goal of minimizing the information required to describe the data vectors using the model.

Cascaded redundancy reduction

- Computer Science
- 1998

A method for incrementally constructing a hierarchical generative model of an ensemble of binary data vectors composed of stochastic, binary, logistic units with the goal of minimizing the information required to describe the data vectors using the model.

Using Helmholtz Machines to Analyze Multi-channel Neuronal Recordings

- Computer ScienceNIPS
- 1997

This work presents an algorithm for automated discovery of stochastic firing patterns in large ensembles of neurons, from the "Helmholtz Machine" family, which attempts to predict the observed spike patterns in the data.

Mean eld approach to learning inBoltzmann

- Computer Science
- 1997

A new approximate learning algorithm for Boltzmann Machines is presented, which is based on mean eld theory and the linear response theorem, and it is shown how the weights can be directly computed from the xed point equation of the learning rules.

Emergence of Modularity within One Sheet of Intrinsically Active Stochastic Neurons

- Computer Science
- 2000

This work investigates how modular structure within a neu-ral net of stochastic neurons can emerge from intrinsic dynamics and from unsupervised learning of data and shows using a diierent data set that a parallel structure emerges which matches the data.

Training Bidirectional Helmholtz Machines

- Computer Science
- 2015

This work presents a lower-bound for the likelihood of the generative model and shows that optimizing this bound regularizes the model so that the Bhattacharyya distance between the bottom-up and top-down approximate distributions is minimized.

## References

SHOWING 1-10 OF 58 REFERENCES

A Bayesian Analysis of Self-Organizing Maps

- MathematicsNeural Computation
- 1994

Bayesian methods are used to analyze some of the properties of a special type of Markov chain and derive the theory of self-supervision, in which the higher layers of a multilayer network supervise the lower layers, even though overall there is no external teacher.

Unsupervised Learning of Mixtures of Multiple Causes in Binary Data

- Computer ScienceNIPS
- 1993

This paper presents a formulation for unsupervised learning of clusters reflecting multiple causal structure in binary data, and demonstrates the algorithm's ability to discover coherent multiple causal representations of noisy test data and in images of printed characters.

Dynamic Model of Visual Recognition Predicts Neural Response Properties in the Visual Cortex

- BiologyNeural Computation
- 1997

A hierarchical network model of visual recognition that explains experimental observations regarding neural responses in both free viewing and fixating conditions by using a form of the extended Kalman filter as given by the minimum description length (MDL) principle is described.

Self-organizing neural network that discovers surfaces in random-dot stereograms

- Computer ScienceNature
- 1992

The authors' simulations show that when the learning procedure is applied to adjacent patches of two-dimensional images, it allows a neural network that has no prior knowledge of the third dimension to discover depth in random dot stereograms of curved surfaces.

A Multiple Cause Mixture Model for Unsupervised Learning

- Computer ScienceNeural Computation
- 1995

A formulation for unsupervised learning of clusters reflecting multiple causal structure in binary data, which employs an objective function and iterative gradient descent learning algorithm resembling the conventional mixture model and demonstrates its ability to discover coherent multiple causal representations in several experimental data sets.

A massively parallel architecture for a self-organizing neural pattern recognition machine

- Computer ScienceComput. Vis. Graph. Image Process.
- 1987

Learning Population Codes by Minimizing Description Length

- Computer ScienceNeural Computation
- 1995

It is shown how MDL can be used to develop highly redundant population codes, thus allowing flexibility, as the network develops a discontinuous topography when presented with different input classes.

Autoencoders, Minimum Description Length and Helmholtz Free Energy

- Computer ScienceNIPS
- 1993

It is shown that the recognition weights of an autoencoder can be used to compute an approximation to the Boltzmann distribution and that this approximation gives an upper bound on the description length.