Your Autoregressive Generative Model Can be Better If You Treat It as an Energy-Based One

@article{Wang2022YourAG,
  title={Your Autoregressive Generative Model Can be Better If You Treat It as an Energy-Based One},
  author={Yezhen Wang and Tong Che and Bo Li and Kaitao Song and Hengzhi Pei and Yoshua Bengio and Dongsheng Li},
  journal={ArXiv},
  year={2022},
  volume={abs/2206.12840}
}
Autoregressive generative models are commonly used, especially for those tasks involving sequential data. They have, however, been plagued by a slew of inherent flaws due to the intrinsic charac-teristics of chain-style conditional modeling (e.g., exposure bias or lack of long-range coherence), severely limiting their ability to model distributions properly. In this paper, we propose a unique method termed E-ARM for training autoregressive generative models that takes advantage of a well… 

Figures and Tables from this paper

References

SHOWING 1-10 OF 50 REFERENCES

Anytime Sampling for Autoregressive Models via Ordered Autoencoding

TLDR
This work proposes a new family of autoregressive models that enables anytime sampling, inspired by Principal Component Analysis, and learns a structured representation space where dimensions are ordered based on their importance with respect to reconstruction.

PixelSNAIL: An Improved Autoregressive Generative Model

TLDR
This work introduces a new generative model architecture that combines causal convolutions with self attention and presents state-of-the-art log-likelihood results on CIFAR-10 and ImageNet.

Your GAN is Secretly an Energy-based Model and You Should use Discriminator Driven Latent Sampling

TLDR
Discriminator Driven Latent Sampling is shown to be highly efficient compared to previous methods which work in the high-dimensional pixel space and can be applied to improve on previously trained GANs of many types and achieves a new state-of-the-art in unconditional image synthesis setting without introducing extra parameters or additional training.

No MCMC for me: Amortized sampling for fast and stable training of energy-based models

TLDR
This work presents a simple method for training EBMs at scale which uses an entropy-regularized generator to amortize the MCMC sampling typically used in EBM training, and improves upon prior MCMC-based entropy regularization methods with a fast variational approximation.

Autoregressive Energy Machines

TLDR
The Autoregressive Energy Machine is proposed, an energy-based model which simultaneously learns an unnormalized density and computes an importance-sampling estimate of the normalizing constant for each conditional in an autoregressive decomposition, achieves state-of-the-art performance on a suite of density-estimation tasks.

Residual Energy-Based Models for Text

TLDR
This work finds experimentally that the answer is affirmative when one has access to the training data for the model, and guardedly affirmative even if one does not, suggesting that the auto-regressive models can be improved by incorporating the (globally normalized) discriminators into the generative process.

WaveNet: A Generative Model for Raw Audio

TLDR
WaveNet, a deep neural network for generating raw audio waveforms, is introduced; it is shown that it can be efficiently trained on data with tens of thousands of samples per second of audio, and can be employed as a discriminative model, returning promising results for phoneme recognition.

Your Classifier is Secretly an Energy Based Model and You Should Treat it Like One

TLDR
This approach is the first to achieve performance rivaling the state-of-the-art in both generative and discriminative learning within one hybrid model.

Generalized Energy Based Models

TLDR
The GEBM samples on image-generation tasks are of much better quality than those from the learned generator alone, indicating that all else being equal, the GEBM will outperform a GAN of the same complexity.

Deep Directed Generative Models with Energy-Based Probability Estimation

TLDR
This work proposes to train a deep directed generative model (not a Markov chain) so that its sampling distribution approximately matches the energy function that is being trained, Inspired by generative adversarial networks.