Principled Pruning of Bayesian Neural Networks through Variational Free Energy Minimization

@article{Beckers2022PrincipledPO,
  title={Principled Pruning of Bayesian Neural Networks through Variational Free Energy Minimization},
  author={Jim Beckers and Bart van Erp and Ziyue Zhao and Kirill Sergeyevich Kondrashov and Bert de Vries},
  journal={ArXiv},
  year={2022},
  volume={abs/2210.09134}
}
Bayesian model reduction provides an efficient approach for comparing the performance of all nested sub-models of a model, without re-evaluating any of these sub-models. Until now, Bayesian model reduction has been applied mainly in the computational neuroscience community. In this paper, we formulate and apply Bayesian model reduction to perform principled pruning of Bayesian neural networks, based on variational free energy minimization. This novel parameter pruning scheme solves the… 

Figures and Tables from this paper

References

SHOWING 1-10 OF 33 REFERENCES

Sampling-Free Variational Inference of Bayesian Neural Networks by Variance Backpropagation

We propose a new Bayesian Neural Net formulation that affords variational inference for which the evidence lower bound is analytically tractable subject to a tight approximation. We achieve this

Practical Variational Inference for Neural Networks

This paper introduces an easy-to-implement stochastic variational method (or equivalently, minimum description length loss function) that can be applied to most neural networks and revisits several common regularisers from a variational perspective.

Bayesian model reduction

Bayesian model reduction is considered and structure learning and hierarchical or empirical Bayes that can be regarded as a metaphor for neurobiological processes like abductive reasoning are considered.

Assumed Density Filtering Methods for Learning Bayesian Neural Networks

This paper rigorously compares the recently proposed assumed density filtering based methods for learning Bayesian neural networks -- Expectation and Probabilistic backpropagation and develops several extensions, including a version of EBP for continuous regression problems and a PBP variant for binary classification.

A Practical Bayesian Framework for Backpropagation Networks

  • D. Mackay
  • Computer Science
    Neural Computation
  • 1992
A quantitative and practical Bayesian framework is described for learning of mappings in feedforward networks that automatically embodies "Occam's razor," penalizing overflexible and overcomplex models.

Priors in Bayesian Deep Learning: A Review

An overview of different priors that have been proposed for (deep) Gaussian processes, variational autoencoders and Bayesian neural networks is presented and different methods of learning priors for these models from data are outlined.

Probabilistic Backpropagation for Scalable Learning of Bayesian Neural Networks

This work presents a novel scalable method for learning Bayesian neural networks, called probabilistic backpropagation (PBP), which works by computing a forward propagation of probabilities through the network and then doing a backward computation of gradients.

Variational Bayesian dropout: pitfalls and fixes

This work proffer Quasi-KL (QKL) divergence, a new approximate inference objective for approximation of high-dimensional distributions, and shows that motivations for variational Bernoulli dropout based on discretisation and noise have QKL as a limit.

Deterministic Variational Inference for Robust Bayesian Neural Networks

This work introduces a novel deterministic method to approximate moments in neural networks, eliminating gradient variance and introduces a hierarchical prior for parameters and a novel Empirical Bayes procedure for automatically selecting prior variances, and demonstrates good predictive performance over alternative approaches.

On Priors for Bayesian Neural Networks

This dissertation aims to help the reader navigate the landscape of neural network priors, surveys the existing work on priors for neural networks, isolating some key themes such as the move towards heavy-tailed priors and describes how to give Bayesian neural networks an adaptive width by placing stick-breaking priors on their latent representation.