• Corpus ID: 218971791

Joint Stochastic Approximation and Its Application to Learning Discrete Latent Variable Models

  title={Joint Stochastic Approximation and Its Application to Learning Discrete Latent Variable Models},
  author={Zhijian Ou and Yunfu Song},
Although with progress in introducing auxiliary amortized inference models, learning discrete latent variable models is still challenging. In this paper, we show that the annoying difficulty of obtaining reliable stochastic gradients for the inference model and the drawback of indirectly optimizing the target log-likelihood can be gracefully addressed in a new method based on stochastic approximation (SA) theory of the Robbins-Monro type. Specifically, we propose to directly maximize the target… 

Figures and Tables from this paper

Adaptive Strategy for Resetting a Non-stationary Markov Chain during Learning via Joint Stochastic Approximation
Preliminary results based on the recent work by Ou and Song (2020) are reported, which show improvement over the Reweighted-Wake-Sleep algorithm (RWS) (Bornschein and Bengio, 2015) for training deep generative models with discrete latent variables, such as Helmholtz Machine.
Coupled Gradient Estimators for Discrete Latent Variables
Gradient estimators based on reparameterizing categorical variables as sequences of binary variables and Rao-Blackwellization are introduced and it is shown that these proposed categorical gradient estimators provide state-of-the-art performance.
Transport Score Climbing: Variational Inference Using Forward KL and Adaptive Neural Transport
Transport Score Climbing (TSC) is introduced, a method that optimizes KL ( p || q ) by using Hamiltonian Monte Carlo (HMC) and a novel adaptive transport map and achieves competitive performance when training variational autoencoders on large-scale data.
Markovian Score Climbing: Variational Inference with KL(p||q)
This paper develops a simple algorithm for reliably minimizing the inclusive KL, and provides a new algorithm that melds VI and MCMC, and demonstrates the utility of MSC on Bayesian probit regression for classification as well as a stochastic volatility model for financial data.


Joint Stochastic Approximation learning of Helmholtz Machines
This paper successfully develops a new class of algorithms, based on stochastic approximation (SA) theory of the Robbins-Monro type, to directly optimize the marginal log-likelihood and simultaneously minimize the inclusive KL-divergence.
Variational Inference for Monte Carlo Objectives
The first unbiased gradient estimator designed for importance-sampled objectives is developed, which is both simpler and more effective than the NVIL estimator proposed for the single-sample variational objective, and is competitive with the currently used biases.
Auto-Encoding Variational Bayes
A stochastic variational inference and learning algorithm that scales to large datasets and, under some mild differentiability conditions, even works in the intractable case is introduced.
Learning Deep Latent Gaussian Models with Markov Chain Monte Carlo
This paper proposes a different approach to deep latent Gaussian models: rather than use a variational approximation, this work uses Markov chain Monte Carlo (MCMC), which yields higher held-out likelihoods, produces sharper images, and does not suffer from the variational overpruning effect.
Neural Variational Inference and Learning in Belief Networks
This work proposes a fast non-iterative approximate inference method that uses a feedforward network to implement efficient exact sampling from the variational posterior and shows that it outperforms the wake-sleep algorithm on MNIST and achieves state-of-the-art results on the Reuters RCV1 document dataset.
REBAR: Low-variance, unbiased gradient estimates for discrete latent variable models
This work introduces a modification to the continuous relaxation of discrete variables and shows that the tightness of the relaxation can be adapted online, removing it as a hyperparameter, leading to faster convergence to a better final log-likelihood.
Importance Weighted Autoencoders
The importance weighted autoencoder (IWAE), a generative model with the same architecture as the VAE, but which uses a strictly tighter log-likelihood lower bound derived from importance weighting, shows empirically that IWAEs learn richer latent space representations than VAEs, leading to improved test log- likelihood on density estimation benchmarks.
Variational Bayesian Inference with Stochastic Search
This work presents an alternative algorithm based on stochastic optimization that allows for direct optimization of the variational lower bound and demonstrates the approach on two non-conjugate models: logistic regression and an approximation to the HDP.
On the quantitative analysis of deep belief networks
It is shown that Annealed Importance Sampling (AIS) can be used to efficiently estimate the partition function of an RBM, and a novel AIS scheme for comparing RBM's with different architectures is presented.
Tighter Variational Bounds are Not Necessarily Better
We provide theoretical and empirical evidence that using tighter evidence lower bounds (ELBOs) can be detrimental to the process of learning an inference network by reducing the signal-to-noise ratio