Corpus ID: 220380907

Efficient Learning of Generative Models via Finite-Difference Score Matching

  title={Efficient Learning of Generative Models via Finite-Difference Score Matching},
  author={Tianyu Pang and Kun Xu and Chongxuan Li and Yang Song and S. Ermon and Jun Zhu},
Several machine learning applications involve the optimization of higher-order derivatives (e.g., gradients of gradients) during training, which can be expensive in respect to memory and computation even with automatic differentiation. As a typical example in generative modeling, score matching (SM) involves the optimization of the trace of a Hessian. To improve computing efficiency, we rewrite the SM objective and its variants in terms of directional derivatives, and present a generic strategy… Expand
Bi-level Score Matching for Learning Energy-based Latent Variable Models
BiSM is comparable to the widely adopted contrastive divergence and SM methods when they are applicable; and can learn complex EBLVMs with intractable posteriors to generate natural images. Expand
Diffusion models for Handwriting Generation
A diffusion probabilistic model for handwriting generation that is able to incorporate writer stylistic features directly from image data, eliminating the need for user interaction during sampling. Expand
No MCMC for me: Amortized sampling for fast and stable training of energy-based models
This work presents a simple method for training EBMs at scale which uses an entropy-regularized generator to amortize the MCMC sampling typically used in EBM training, and improves upon prior MCMC-based entropy regularization methods with a fast variational approximation. Expand
Score-based Generative Modeling in Latent Space
The Latent Score-based Generative Model (LSGM) is proposed, a novel approach that trains SGMs in a latent space, relying on the variational autoencoder framework, and achieves state-of-the-art likelihood on the binarized OMNIGLOT dataset. Expand
Variational (Gradient) Estimate of the Score Function in Energy-based Latent Variable Models
Variational estimates of the score function and its gradient with respect to the model parameters in a general EBLVM, referred to as VaES and VaGES respectively are presented. Expand
XOR-CD: Linearly Convergent Constrained Structure Generation
XOR-CD harnesses XOR-Sampling to generate samples from the model distribution in CD learning and is guaranteed to generate valid structures and has a linear convergence rate towards the global maximum of the likelihood function within a vanishing constant in learning exponential family models. Expand
Score-Based Generative Modeling through Stochastic Differential Equations
This work presents a stochastic differential equation (SDE) that smoothly transforms a complex data distribution to a known prior distribution by slowly injecting noise, and a corresponding reverse-time SDE that transforms the prior distribution back into the data distribution by Slowly removing the noise. Expand


Auto-Encoding Variational Bayes
A stochastic variational inference and learning algorithm that scales to large datasets and, under some mild differentiability conditions, even works in the intractable case is introduced. Expand
Efficient and principled score estimation with Nyström kernel exponential families
Compared to an existing score learning approach using a denoising autoencoder, the estimator is empirically more data-efficient when estimating the score, runs faster, and has fewer parameters (which can be tuned in a principled and interpretable way), in addition to providing statistical guarantees. Expand
Sliced Score Matching: A Scalable Approach to Density and Score Estimation
It is demonstrated that sliced score matching can learn deep energy-based models effectively, and can produce accurate score estimates for applications such as variational inference with implicit distributions and training Wasserstein Auto-Encoders. Expand
Adam: A Method for Stochastic Optimization
This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework. Expand
Gradient Estimators for Implicit Models
The Stein gradient estimator is proposed, which directly estimates the score function of the implicitly defined distribution and is empirically demonstrated by examples that include meta-learning for approximate inference, and entropy regularised GANs that provide improved sample diversity. Expand
Glow: Generative Flow with Invertible 1x1 Convolutions
Glow, a simple type of generative flow using an invertible 1x1 convolution, is proposed, demonstrating that a generative model optimized towards the plain log-likelihood objective is capable of efficient realistic-looking synthesis and manipulation of large images. Expand
Higher Order Contractive Auto-Encoder
A novel regularizer when training an autoencoder for unsupervised feature extraction yields representations that are significantly better suited for initializing deep architectures than previously proposed approaches, beating state-of-the-art performance on a number of datasets. Expand
GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium
This work proposes a two time-scale update rule (TTUR) for training GANs with stochastic gradient descent on arbitrary GAN loss functions and introduces the "Frechet Inception Distance" (FID) which captures the similarity of generated images to real ones better than the Inception Score. Expand
Generative Modeling by Estimating Gradients of the Data Distribution
A new generative model where samples are produced via Langevin dynamics using gradients of the data distribution estimated with score matching, which allows flexible model architectures, requires no sampling during training or the use of adversarial methods, and provides a learning objective that can be used for principled model comparisons. Expand
Importance Weighted Autoencoders
The importance weighted autoencoder (IWAE), a generative model with the same architecture as the VAE, but which uses a strictly tighter log-likelihood lower bound derived from importance weighting, shows empirically that IWAEs learn richer latent space representations than VAEs, leading to improved test log- likelihood on density estimation benchmarks. Expand