It is demonstrated that constant SGD gives rise to a new variational EM algorithm that optimizes hyperparameters in complex probabilistic models and a scalable approximate MCMC algorithm, the Averaged Stochastic Gradient Sampler is proposed.
Empirical evidence is given for the hypothesis that stochastic RNNs as latent state models are more efficient at compressing and generating long sequences than deterministic ones, which may be relevant for applications in video compression.
This work demonstrates through careful MCMC sampling that the posterior predictive induced by the Bayes posterior yields systematically worse predictions compared to simpler methods including point estimates obtained from SGD and argues that it is timely to focus on understanding the origin of the improved performance of cold posteriors.
This work proposes a novel approach to anomaly detection using generative adversarial networks, based on searching for a good representation of that sample in the latent space of the generator; if such a representation is not found, the sample is deemed anomalous.
This work proposes a new deep sequential latent variable model for dimensionality reduction and data imputation of multivariate time series from the domains of computer vision and healthcare, and demonstrates that this approach outperforms several classical and deep learning-based data imputations methods on high-dimensional data.
Experimental results on three different corpora demonstrate that the dynamic model infers word embedding trajectories that are more interpretable and lead to higher predictive likelihoods than competing methods that are based on static models trained separately on time slices.
This work proposes iterative inference models, which learn to perform inference optimization through repeatedly encoding gradients, and demonstrates the inference optimization capabilities of these models and shows that they outperform standard inference models on several benchmark data sets of images and text.
On all three applications—neural activity of zebrafish, users' shopping behavior, and movie ratings—the exponential family embedding models are found to be more effective than other types of dimension reduction and better reconstruct held-out data and find interesting qualitative structure.
This analysis rests on interpreting SGD as a continuous-time stochastic process and then minimizing the Kullback-Leibler divergence between its stationary distribution and the target posterior to show how to adjust the tuning parameters of SGD.