Pseudo-Spherical Contrastive Divergence
@inproceedings{Yu2021PseudoSphericalCD, title={Pseudo-Spherical Contrastive Divergence}, author={Lantao Yu and Jiaming Song and Yang Song and Stefano Ermon}, booktitle={NeurIPS}, year={2021} }
Energy-based models (EBMs) offer flexible distribution parametrization. However, due to the intractable partition function, they are typically trained via contrastive divergence for maximum likelihood estimation. In this paper, we propose pseudo-spherical contrastive divergence (PS-CD) to generalize maximum likelihood learning of EBMs. PS-CD is derived from the maximization of a family of strictly proper homogeneous scoring rules, which avoids the computation of the intractable partition…
Figures and Tables from this paper
References
SHOWING 1-10 OF 94 REFERENCES
Unbiased Contrastive Divergence Algorithm for Training Energy-Based Latent Variable Models
- Computer ScienceICLR
- 2020
An unbiased version of the contrastive divergence algorithm is proposed that completely removes its bias in stochastic gradient methods, based on recent advances on unbiased Markov chain Monte Carlo methods.
Implicit Generation and Modeling with Energy Based Models
- Computer ScienceNeurIPS
- 2019
This work presents techniques to scale MCMC based EBM training on continuous neural networks, and shows its success on the high-dimensional data domains of ImageNet32x32, ImageNet128x128, CIFAR-10, and robotic hand trajectories, achieving better samples than other likelihood models and nearing the performance of contemporary GAN approaches.
KALE: When Energy-Based Learning Meets Adversarial Training
- Computer ScienceArXiv
- 2020
This work usesLegendre duality to provide a variational lowerbound for the Kullback-Leibler divergence, and shows that this estimator, the KL Approximate Lower-bound Estimate (KALE), provides a maximum likelihood estimate (MLE) and extends this procedure to adversarial training.
Exponential Family Estimation via Adversarial Dynamics Embedding
- Computer ScienceNeurIPS
- 2019
An efficient algorithm for maximum likelihood estimation (MLE) of exponential family models, with a general parametrization of the energy function that includes neural networks, is presented and it is shown that adapting the sampler during MLE can significantly improve on state-of-the-art estimators.
Noise-Contrastive Estimation of Unnormalized Statistical Models, with Applications to Natural Image Statistics
- Computer Science, MathematicsJ. Mach. Learn. Res.
- 2012
The basic idea is to perform nonlinear logistic regression to discriminate between the observed data and some artificially generated noise and it is shown that the new method strikes a competitive trade-off in comparison to other estimation methods for unnormalized models.
Importance Weighted Autoencoders
- Computer ScienceICLR
- 2016
The importance weighted autoencoder (IWAE), a generative model with the same architecture as the VAE, but which uses a strictly tighter log-likelihood lower bound derived from importance weighting, shows empirically that IWAEs learn richer latent space representations than VAEs, leading to improved test log- likelihood on density estimation benchmarks.
Stochastic Gradient Descent with Biased but Consistent Gradient Estimators
- Computer ScienceArXiv
- 2018
This work shows, in a general setting, that consistent gradient estimators result in the same convergence behavior as do unbiased ones, and opens several new research directions, including the development of more efficient SGD updates with consistent estimators and the design of efficient training algorithms for large-scale graphs.
Flow Contrastive Estimation of Energy-Based Models
- Computer Science2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- 2020
A significant improvement on the synthesis quality of the flow model is demonstrated, and the effectiveness of unsupervised feature learning by the learned energy-based model is shown, which can be easily adapted to semi-supervised learning.
Generative Modeling by Estimating Gradients of the Data Distribution
- Computer ScienceNeurIPS
- 2019
A new generative model where samples are produced via Langevin dynamics using gradients of the data distribution estimated with score matching, which allows flexible model architectures, requires no sampling during training or the use of adversarial methods, and provides a learning objective that can be used for principled model comparisons.
GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium
- Computer ScienceNIPS
- 2017
This work proposes a two time-scale update rule (TTUR) for training GANs with stochastic gradient descent on arbitrary GAN loss functions and introduces the "Frechet Inception Distance" (FID) which captures the similarity of generated images to real ones better than the Inception Score.