• Corpus ID: 246035851

Online, Informative MCMC Thinning with Kernelized Stein Discrepancy

@article{Hawkins2022OnlineIM,
  title={Online, Informative MCMC Thinning with Kernelized Stein Discrepancy},
  author={Cole Hawkins and Alec Koppel and Zheng Zhang},
  journal={ArXiv},
  year={2022},
  volume={abs/2201.07130}
}
A fundamental challenge in Bayesian inference is efficient representation of a target distribution. Many non-parametric approaches do so by sampling a large number of points using variants of Markov Chain Monte Carlo (MCMC). We propose an MCMC variant that retains only those posterior samples which exceed a KSD threshold, which we call KSD Thinning. We establish the convergence and complexity tradeoffs for several settings of KSD Thinning as a function of the KSD threshold parameter, sample size… 
A stochastic Stein Variational Newton method
TLDR
This paper derives, and provides a practical implementation of, a stochastic variant of SVN (sSVN) which is both asymptotically correct and converges rapidly and is a promising approach to accelerating high-precision Bayesian inference tasks with modest-dimension.
Posterior Coreset Construction with Kernelized Stein Discrepancy for Model-Based Reinforcement Learning
TLDR
This work proposes a novel Kernelized S tein Discrepancy-based Posterior Sampling for RL algorithm (named KSRL), which extends model-based RL based upon posterior sampling (PSRL) in several ways, and develops a novel regret analysis of PSRL based upon integral probability metrics.
A STOCHASTIC S TEIN VARIATIONAL N EWTON METHOD
TLDR
This paper derives, and provides a practical implementation of, a stochastic variant of SVN (sSVN) which is both asymptotically correct and converges rapidly and is a promising approach to accelerating high-precision Bayesian inference tasks with modest-dimension.

References

SHOWING 1-10 OF 35 REFERENCES
Measuring Sample Quality with Kernels
TLDR
A theory of weak convergence for K SDs based on Stein's method is developed, it is demonstrated that commonly used KSDs fail to detect non-convergence even for Gaussian targets, and it is shown that kernels with slowly decaying tails provably determine convergence for a large class of target distributions.
Dimension-independent likelihood-informed MCMC
Optimal thinning of MCMC output
  • M. Riabiz, W. Chen, C. Oates
  • Computer Science
    Journal of the Royal Statistical Society: Series B (Statistical Methodology)
  • 2022
TLDR
A novel method is proposed, based on greedy minimisation of a kernel Stein discrepancy, that is suitable for problems where heavy compression is required and its effectiveness is demonstrated in the challenging context of parameter inference for ordinary differential equations.
Bayesian Learning via Stochastic Gradient Langevin Dynamics
In this paper we propose a new framework for learning from large scale datasets based on iterative learning from small mini-batches. By adding the right amount of noise to a standard stochastic
Stein Point Markov Chain Monte Carlo
TLDR
This paper removes the need to solve this optimisation problem by selecting each new point based on a Markov chain sample path, which significantly reduces the computational cost of Stein Points and leads to a suite of algorithms that are straightforward to implement.
A Complete Recipe for Stochastic Gradient MCMC
TLDR
This paper provides a general recipe for constructing MCMC samplers--including stochastic gradient versions--based on continuous Markov processes specified via two matrices, and uses the recipe to straightforwardly propose a new state-adaptive sampler: stochastics gradient Riemann Hamiltonian Monte Carlo (SGRHMC).
What Are Bayesian Neural Network Posteriors Really Like?
TLDR
It is shown that BNNs can achieve significant performance gains over standard training and deep ensembles, and a single long HMC chain can provide a comparable representation of the posterior to multiple shorter chains, and posterior tempering is not needed for near-optimal performance.
Subspace Inference for Bayesian Deep Learning
TLDR
Low-dimensional subspaces of parameter space, such as the first principal components of the stochastic gradient descent (SGD) trajectory, are constructed, which contain diverse sets of high performing models and show that Bayesian model averaging over the induced posterior produces accurate predictions and well calibrated predictive uncertainty for both regression and image classification.
Consistent Online Gaussian Process Regression Without the Sample Complexity Bottleneck
  • Alec Koppel
  • Computer Science
    2019 American Control Conference (ACC)
  • 2019
TLDR
This work develops the first compression sub-routine for online Gaussian processes that preserves their convergence to the population posterior, i.e., asymptotic posterior consistency, while ameliorating their intractable complexity growth with the sample size.
A Stochastic Newton MCMC Method for Large-Scale Statistical Inverse Problems with Application to Seismic Inversion
TLDR
This work addresses the solution of large-scale statistical inverse problems in the framework of Bayesian inference with a so-called Stochastic Monte Carlo method.
...
...