• Corpus ID: 240354544

Pseudo-Spherical Contrastive Divergence

@inproceedings{Yu2021PseudoSphericalCD,
  title={Pseudo-Spherical Contrastive Divergence},
  author={Lantao Yu and Jiaming Song and Yang Song and Stefano Ermon},
  booktitle={NeurIPS},
  year={2021}
}
Energy-based models (EBMs) offer flexible distribution parametrization. However, due to the intractable partition function, they are typically trained via contrastive divergence for maximum likelihood estimation. In this paper, we propose pseudo-spherical contrastive divergence (PS-CD) to generalize maximum likelihood learning of EBMs. PS-CD is derived from the maximization of a family of strictly proper homogeneous scoring rules, which avoids the computation of the intractable partition… 

References

SHOWING 1-10 OF 94 REFERENCES
Unbiased Contrastive Divergence Algorithm for Training Energy-Based Latent Variable Models
TLDR
An unbiased version of the contrastive divergence algorithm is proposed that completely removes its bias in stochastic gradient methods, based on recent advances on unbiased Markov chain Monte Carlo methods.
Implicit Generation and Modeling with Energy Based Models
TLDR
This work presents techniques to scale MCMC based EBM training on continuous neural networks, and shows its success on the high-dimensional data domains of ImageNet32x32, ImageNet128x128, CIFAR-10, and robotic hand trajectories, achieving better samples than other likelihood models and nearing the performance of contemporary GAN approaches.
KALE: When Energy-Based Learning Meets Adversarial Training
TLDR
This work usesLegendre duality to provide a variational lowerbound for the Kullback-Leibler divergence, and shows that this estimator, the KL Approximate Lower-bound Estimate (KALE), provides a maximum likelihood estimate (MLE) and extends this procedure to adversarial training.
Exponential Family Estimation via Adversarial Dynamics Embedding
TLDR
An efficient algorithm for maximum likelihood estimation (MLE) of exponential family models, with a general parametrization of the energy function that includes neural networks, is presented and it is shown that adapting the sampler during MLE can significantly improve on state-of-the-art estimators.
Noise-Contrastive Estimation of Unnormalized Statistical Models, with Applications to Natural Image Statistics
TLDR
The basic idea is to perform nonlinear logistic regression to discriminate between the observed data and some artificially generated noise and it is shown that the new method strikes a competitive trade-off in comparison to other estimation methods for unnormalized models.
Importance Weighted Autoencoders
TLDR
The importance weighted autoencoder (IWAE), a generative model with the same architecture as the VAE, but which uses a strictly tighter log-likelihood lower bound derived from importance weighting, shows empirically that IWAEs learn richer latent space representations than VAEs, leading to improved test log- likelihood on density estimation benchmarks.
Stochastic Gradient Descent with Biased but Consistent Gradient Estimators
TLDR
This work shows, in a general setting, that consistent gradient estimators result in the same convergence behavior as do unbiased ones, and opens several new research directions, including the development of more efficient SGD updates with consistent estimators and the design of efficient training algorithms for large-scale graphs.
Flow Contrastive Estimation of Energy-Based Models
TLDR
A significant improvement on the synthesis quality of the flow model is demonstrated, and the effectiveness of unsupervised feature learning by the learned energy-based model is shown, which can be easily adapted to semi-supervised learning.
Generative Modeling by Estimating Gradients of the Data Distribution
TLDR
A new generative model where samples are produced via Langevin dynamics using gradients of the data distribution estimated with score matching, which allows flexible model architectures, requires no sampling during training or the use of adversarial methods, and provides a learning objective that can be used for principled model comparisons.
GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium
TLDR
This work proposes a two time-scale update rule (TTUR) for training GANs with stochastic gradient descent on arbitrary GAN loss functions and introduces the "Frechet Inception Distance" (FID) which captures the similarity of generated images to real ones better than the Inception Score.
...
1
2
3
4
5
...