• Corpus ID: 211096797

Cutting out the Middle-Man: Training and Evaluating Energy-Based Models without Sampling

  title={Cutting out the Middle-Man: Training and Evaluating Energy-Based Models without Sampling},
  author={Will Grathwohl and Kuan-Chieh Wang and J{\"o}rn-Henrik Jacobsen and David Kristjanson Duvenaud and Richard S. Zemel},
We present a new method for evaluating and training unnormalized density models. Our approach only requires access to the gradient of the unnormalized model's log-density. We estimate the Stein discrepancy between the data density p(x) and the model density q(x) defined by a vector function of the data. We parameterize this function with a neural network and fit its parameters to maximize the discrepancy. This yields a novel goodness-of-fit test which outperforms existing methods on high… 

How to Train Your Energy-Based Models

This tutorial starts by explaining maximum likelihood training with Markov chain Monte Carlo (MCMC), and proceed to elaborate on MCMC-free approaches, including Score Matching and Noise Constrastive Estimation, to highlight theoretical connections among these three approaches.

Sliced Kernelized Stein Discrepancy

A novel particle inference method called sliced Stein variational gradient descent (S-SVGD) is proposed which alleviates the mode-collapse issue of SVGD in training variational autoencoders and significantly outperforms KSD and various baselines in high dimensions.

Learning Energy-Based Models by Diffusion Recovery Likelihood

This work presents a diffusion recovery likelihood method to tractably learn and sample from a sequence of EBMs trained on increasingly noisy versions of a dataset, and demonstrates that unlike previous work on EBMs, long-run MCMC samples from the conditional distributions do not diverge and still represent realistic images, allowing us to accurately estimate the normalized density of data even for high-dimensional datasets.

Bridging Explicit and Implicit Deep Generative Models via Neural Stein Estimators

A novel joint training framework is proposed that bridges an explicit (unnormalized) density estimator and an implicit sample generator via Stein discrepancy and induces novel mutual regularization via kernel Sobolev norm penalization and Moreau-Yosida regularization and stabilizes the training dynamics.


A novel particle inference method called sliced Stein variational gradient descent (S-SVGD) which alleviates the mode-collapse issue of SVGD in training variational autoencoders and significantly outperforms KSD and various baselines in high dimensions.

Improved Contrastive Divergence Training of Energy Based Models

It is shown that a gradient term neglected in the popular contrastive divergence formulation is both tractable to estimate and important to avoid training instabilities in previous models, and how data augmentation, multi-scale processing, and reservoir sampling can be used to improve model robustness and generation quality.

Telescoping Density-Ratio Estimation

This work introduces a new framework, telescoping density-ratio estimation (TRE), that enables the estimation of ratios between highly dissimilar densities in high-dimensional spaces and demonstrates that TRE can yield substantial improvements over existing single-Ratio methods for mutual information estimation, representation learning and energy-based modelling.

Diffusion Models: A Comprehensive Survey of Methods and Applications

A comprehensive review of existing variants of the diffusion models and a thorough investigation into the applications of diffusion models, including computer vision, natural language processing, waveform signal processing, multi-modal modeling, molecular graph generation, time series modeling, and adversarial purification.

Adversarial Stein Training for Graph Energy Models

This work uses an energy-based model based on multi-channel graph neural networks to learn permutation invariant unnormalized density functions on graphs via minimizing adversarial stein discrepancy and finds that this approach achieves competitive results on graph generation compared to benchmark models.

Stein Variational Gaussian Processes

SVGD provides a non-parametric alternative to variational inference which is substantially faster than MCMC but unhindered by parametric assumptions, and it is proved that for GP models with Lipschitz gradients the SVGD algorithm monotonically decreases the Kullback-Leibler divergence from the sampling distribution to the true posterior.



On the Anatomy of MCMC-based Maximum Likelihood Learning of Energy-Based Models

This study investigates the effects of Markov chain Monte Carlo (MCMC) sampling in unsupervised Maximum Likelihood (ML) learning and identifies a variety of ML learning outcomes that depend solely on the implementation of MCMC sampling.

Your Classifier is Secretly an Energy Based Model and You Should Treat it Like One

This approach is the first to achieve performance rivaling the state-of-the-art in both generative and discriminative learning within one hybrid model.

Deep Energy Estimator Networks

A promising solution to density estimation is given here in an inference-free hierarchical framework that is built on score matching, and a multilayer perceptron with a scalable objective for learning the energy is constructed.

Flow Contrastive Estimation of Energy-Based Models

A significant improvement on the synthesis quality of the flow model is demonstrated, and the effectiveness of unsupervised feature learning by the learned energy-based model is shown, which can be easily adapted to semi-supervised learning.

Noise-contrastive estimation: A new estimation principle for unnormalized statistical models

A new estimation principle is presented to perform nonlinear logistic regression to discriminate between the observed data and some artificially generated noise, using the model log-density function in the regression nonlinearity, which leads to a consistent (convergent) estimator of the parameters.

Generative Modeling by Estimating Gradients of the Data Distribution

A new generative model where samples are produced via Langevin dynamics using gradients of the data distribution estimated with score matching, which allows flexible model architectures, requires no sampling during training or the use of adversarial methods, and provides a learning objective that can be used for principled model comparisons.

Implicit Generation and Generalization in Energy-Based Models

This work presents techniques to scale MCMC based EBM training on continuous neural networks, and shows its success on the high-dimensional data domains of ImageNet32x32, ImageNet128x128, CIFAR-10, and robotic hand trajectories, achieving better samples than other likelihood models and nearing the performance of contemporary GAN approaches.

Conditional Noise-Contrastive Estimation of Unnormalised Models

The proposed method shares with NCE the idea of formulating density estimation as a supervised learning problem but in contrast to NCE, the proposed method leverages the observed data when generating noise samples, and can thus be generated in a semi-automated manner.

Sliced Score Matching: A Scalable Approach to Density and Score Estimation

It is demonstrated that sliced score matching can learn deep energy-based models effectively, and can produce accurate score estimates for applications such as variational inference with implicit distributions and training Wasserstein Auto-Encoders.

Adam: A Method for Stochastic Optimization

This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.