• Corpus ID: 17861266

On Contrastive Divergence Learning

  title={On Contrastive Divergence Learning},
  author={Miguel {\'A}. Carreira-Perpi{\~n}{\'a}n and Geoffrey E. Hinton},
  booktitle={International Conference on Artificial Intelligence and Statistics},
Maximum-likelihood (ML) learning of Markov random fields is challenging because it requires estimates of averages that have an exponential number of terms. Markov chain Monte Carlo methods typically take a long time to converge on unbiased estimates, but Hinton (2002) showed that if the Markov chain is only run for a few steps, the learning can still work well and it approximately minimizes a different function called “contrastive divergence” (CD). CD learning has been successfully applied to… 

Figures from this paper

Empirical Analysis of the Divergence of Gibbs Sampling Based Learning Algorithms for Restricted Boltzmann Machines

The results indicate that the log-likelihood seems to diverge especially if the target distribution is difficult to learn for the RBM, and weight- Decay with a carefully chosen weight-decay-parameter can prevent divergence.

Neighborhood-Based Stopping Criterion for Contrastive Divergence

This manuscript presents a simple and cheap alternative to the reconstruction error, based on the inclusion of information contained in neighboring states to the training set, as a stopping criterion for CD learning.

A Neighbourhood-Based Stopping Criterion for Contrastive Divergence Learning

This manuscript investigates simple alternatives to the reconstruction error, based on the inclusion of information contained in neighboring states to the training set, as a stopping criterion for CD learning.

A Cyclic Contrastive Divergence Learning Algorithm for High-Order RBMs

Experimental results show that CCD is more applicable and consistently outperforms the standard CD in both convergent speed and performance, and both algorithms CCD and standard CD are theoretically analyzed, from which the superiority of CCD learning is revealed.

On the Convergence Properties of Contrastive Divergence

This paper analyzes the CD1 update rule for Restricted Boltzmann Machines with binary variables, and shows that the regularized CD update has a fixed point for a large class of regularization functions using Brower’s fixed point theorem.

Convergence of contrastive divergence algorithm in exponential family

This paper studies the asymptotic properties of the CD algorithm in canonical exponential families, which are special cases of the energy-based model and proves the existence of some bounded $m$ such that any limit point of the time average of any given parameter is a consistent estimate for the true parameter.

Learning with Blocks: Composite Likelihood and Contrastive Divergence

This paper shows that composite likelihoods can be stochastically optimized by performing a variant of contrastive divergence with random-scan blocked Gibbs sampling, and demonstrates that using higher-order blocks improves both the accuracy of parameter estimates and the rate of convergence.

Contrastive Divergence Learning with Chained Belief Propagation

This work proposes contrastive divergence learning with chained belief propagation (BPChain-CD), which learns better models compared with BP-CD and CD on a range of maximum-likelihood learning experiments.

Justifying and Generalizing Contrastive Divergence

An expansion of the log likelihood in undirected graphical models such as the restricted Boltzmann machine (RBM), where each term in the expansion is associated with a sample in a Gibbs chain alternating between two random variables, shows that its residual term converges to zero, justifying the use of a truncation of only a short Gibbs chain.



An analysis of contrastive divergence learning in gaussian boltzmann machines

This paper analyses the mean and variance of the parameter update obtained after steps of Gibbs sampling for a simple Gaussian BM and shows that CD learning produces (as expected) a biased estimate of the true parameter update.

Failures of the One-Step Learning Algorithm

Examples of models E(x; w) and Markov chains T for which the true likelihood is unimodal in the parameters are given, but the one-step algorithm does not necessarily converge to the maximum likelihood parameters.

The Convergence of Contrastive Divergences

  • A. Yuille
  • Computer Science, Mathematics
  • 2004
This paper relates the Contrastive Divergence algorithm to the stochastic approximation literature and derives elementary conditions which ensure convergence, and conjecture that far stronger results can be obtained by applying more advanced techniques such as those described by Younes.

Training Products of Experts by Minimizing Contrastive Divergence

A product of experts (PoE) is an interesting candidate for a perceptual system in which rapid inference is vital and generation is unnecessary because it is hard even to approximate the derivatives of the renormalization term in the combination rule.

Stochastic Neighbor Embedding

This probabilistic framework makes it easy to represent each object by a mixture of widely separated low-dimensional images, which allows ambiguous objects, like the document count vector for the word "bank", to have versions close to the images of both "river" and "finance" without forcing the image of outdoor concepts to be located close to those of corporate concepts.

Energy-Based Models for Sparse Overcomplete Representations

A new way of extending independent components analysis (ICA) to overcomplete representations that defines features as deterministic (linear) functions of the inputs and assigns energies to the features through the Boltzmann distribution.

Markov Random Field Modeling in Image Analysis

  • S. Li
  • Computer Science
    Computer Science Workbench
  • 2001
This detailed and thoroughly enhanced third edition presents a comprehensive study / reference to theories, methodologies and recent developments in solving computer vision problems based on MRFs, statistics and optimisation.

Unsupervised Learning of Distributions of Binary Vectors Using 2-Layer Networks

It is shown that arbitrary distributions of binary vectors can be approximated by the combination model and shown how the weight vectors in the model can be interpreted as high order correlation patterns among the input bits, and how the combination machine can be used as a mechanism for detecting these patterns.

Image Analysis, Random Fields and Markov Chain Monte Carlo Methods: A Mathematical Introduction (Stochastic Modelling and Applied Probability)

This paper presents Bayesian Image Analysis: Introduction, a meta-analyses of Bayesian Texture Classification and its Applications, with a focus on Metropolis Algorithms and Spectral Methods.

Multiscale conditional random fields for image labeling

An approach to include contextual features for labeling images, in which each pixel is assigned to one of a finite set of labels, are incorporated into a probabilistic framework, which combines the outputs of several components.