• Corpus ID: 221668990

Contrastive Divergence Learning with Chained Belief Propagation

  title={Contrastive Divergence Learning with Chained Belief Propagation},
  author={Fan Ding and Yexiang Xue},
  booktitle={European Workshop on Probabilistic Graphical Models},
Contrastive Divergence (CD) is an important maximum-likelihood learning approach for probabilistic graphical models. CD maximizes the difference in likelihood between the observed data and those sampled from the current model distribution using Markov Chain Monte Carlo (MCMC). Nevertheless, the overall performance of CD is hampered by the slow mixing rate of MCMC in the presence of combinatorial constraints. A competing approach BP-CD replaces MCMC with Belief Propagation (BP). However, their… 

Figures from this paper

Learning Combinatorial Structures via Markov Random Fields with Sampling through Lovász Local Lemma

N ELSON is developed, a neural network based on Lov´asz Local Lemma that guarantees to generate samples satisfying combinatorial constraints from the distribution of the constrained Markov Random Fields model (MRF), and is presented as a fully differentiable contrastive-divergence-based learning framework on constrained MRF.

XOR-CD: Linearly Convergent Constrained Structure Generation

XOR-CD harnesses XOR-Sampling to generate samples from the model distribution in CD learning and is guaranteed to generate valid structures and has a linear convergence rate towards the global maximum of the likelihood function within a vanishing constant in learning exponential family models.

Provable Constrained Stochastic Convex Optimization with XOR-Projected Gradient Descent

A novel algorithm based on Projected Gradient Descent (PGD) coupled with the XOR sampler, which is guaranteed to solve the constrained stochastic convex optimization problem still in linear convergence rate by choosing proper step size is proposed.

XOR-SGD: provable convex stochastic optimization for decision-making under uncertainty

This work presents XOR-SGD, a stochastic gradient descent (SGD) approach guaranteed to converge to solutions that are at most a constant away from the true optimum in linear number of iterations, and shows that this approach finds better solutions with drastically fewer samples needed compared to a couple of state-ofthe-art solvers.



On Contrastive Divergence Learning

The properties of CD learning are studied and it is shown that it provides biased estimates in general, but that the bias is typically very small.

Convergence of contrastive divergence algorithm in exponential family

This paper studies the asymptotic properties of the CD algorithm in canonical exponential families, which are special cases of the energy-based model and proves the existence of some bounded $m$ such that any limit point of the time average of any given parameter is a consistent estimate for the true parameter.

Belief Propagation in Conditional RBMs for Structured Prediction

It is demonstrated that, in both maximum likelihood and max-margin learning, training conditional RBMs with BP as the inference routine can provide significantly better results than current state-of-the-art CD methods on structured prediction problems.

A Contrastive Divergence for Combining Variational Inference and MCMC

This work develops a method to combine Markov chain Monte Carlo (MCMC) and variational inference (VI), leveraging the advantages of both inference approaches, and introduces the variational contrastive divergence (VCD), a new divergence that replaces the standard Kullback-Leibler (KL) divergence used in VI.

Using fast weights to improve persistent contrastive divergence

It is shown that the weight updates force the Markov chain to mix fast, and using this insight, an even faster mixing chain is developed that uses an auxiliary set of "fast weights" to implement a temporary overlay on the energy landscape.

CODA: convergence diagnosis and output analysis for MCMC

Bayesian inference with Markov Chain Monte Carlo with coda package for R contains a set of functions designed to help the user answer questions about how many samples are required to accurately estimate posterior quantities of interest.

Loopy Belief Propagation for Approximate Inference: An Empirical Study

This paper compares the marginals computed using loopy propagation to the exact ones in four Bayesian network architectures, including two real-world networks: ALARM and QMR, and finds that the loopy beliefs often converge and when they do, they give a good approximation to the correct marginals.

On the Convergence Properties of Contrastive Divergence

This paper analyzes the CD1 update rule for Restricted Boltzmann Machines with binary variables, and shows that the regularized CD update has a fixed point for a large class of regularization functions using Brower’s fixed point theorem.

Generalized Belief Propagation

It is shown that BP can only converge to a stationary point of an approximate free energy, known as the Bethe free energy in statistical physics, and generalized belief propagation (GBP) versions of these Kikuchi approximations are derived.

ACE: adaptive cluster expansion for maximum entropy graphical model inference

The adaptive cluster expansion (ACE) method to quickly and accurately infer Ising or Potts models based on correlation data is described and it is shown that models inferred by ACE have substantially better statistical performance compared to those obtained from faster Gaussian and pseudo-likelihood methods.