# Generalized Bayesian Posterior Expectation Distillation for Deep Neural Networks

@inproceedings{Vadera2020GeneralizedBP, title={Generalized Bayesian Posterior Expectation Distillation for Deep Neural Networks}, author={Meet P. Vadera and Borhan Jalaeian and Benjamin M Marlin}, booktitle={UAI}, year={2020} }

In this paper, we present a general framework for distilling expectations with respect to the Bayesian posterior distribution of a deep neural network classifier, extending prior work on the Bayesian Dark Knowledge framework. The proposed framework takes as input "teacher" and student model architectures and a general posterior expectation of interest. The distillation method performs an online compression of the selected posterior expectation using iteratively generated Monte Carlo samples. We…

## 13 Citations

### Post-hoc loss-calibration for Bayesian neural networks

- Computer ScienceUAI
- 2021

Methods for correcting approximate posterior predictive distributions encouraging them to prefer high-utility decisions are developed, which is agnostic to the choice of the approximate inference algorithm, allows for efficient test time decision making through amortization, and empirically produces higher quality decisions.

### Impact of Parameter Sparsity on Stochastic Gradient MCMC Methods for Bayesian Deep Learning

- Computer ScienceArXiv
- 2022

This paper uses stochastic gradient MCMC methods as the core Bayesian inference method and considers a variety of approaches for selecting sparse network structures, showing that certain classes of randomly selected substructure can perform as well as substructures derived from state-of-the-art iterative pruning methods while drastically reducing model training times.

### Fast Predictive Uncertainty for Classification with Bayesian Deep Networks

- Computer Science, MathematicsUAI
- 2022

It is argued that the resulting Dirichlet distribution has theoretical and practical advantages, in particular more efficient computation of the uncertainty estimate, scaling to large datasets and networks like ImageNet and DenseNet.

### URSABench: Comprehensive Benchmarking of Approximate Bayesian Inference Methods for Deep Neural Networks

- Computer ScienceArXiv
- 2020

Initial work is described on the development ofURSABench, an open-source suite of bench-marking tools for comprehensive assessment of approximate Bayesian inference methods with a focus on deep learning-based classification tasks.

### Self-Distribution Distillation: Efficient Uncertainty Estimation

- Computer ScienceUAI
- 2022

This work proposes a novel training approach, self-distribution distillation (S2D), which is able to efficiently train a single model that can estimate uncertainties, and shows that even a standard deep ensemble can be outperformed using S2D based ensembles and novel distilled models.

### Dense Uncertainty Estimation

- Computer ScienceArXiv
- 2021

It is claimed that conventional deterministic neural network based dense prediction tasks are prone to overfitting, leading to over-confident predictions, which is undesirable for decision making and introduced how uncertainty estimation can be used for deep model calibration to achieve well-calibrated models, namely dense model calibration.

### Bayesian Federated Learning via Predictive Distribution Distillation

- Computer ScienceArXiv
- 2022

This work presents a framework for Bayesian federated learning where each client infers the posterior predictive distribution using its training data and presents various ways to aggregate these client-speciﬁc predictive distributions at the server.

### Benchmarking Scalable Predictive Uncertainty in Text Classification

- Computer ScienceIEEE Access
- 2022

This paper empirically investigates why popular scalable uncertainty estimation strategies (Monte-Carlo Dropout, Deep Ensemble) and notable extensions (Heteroscedastic, Concrete Dropout) underestimate uncertainty, and finds that uncertainty estimation benefits from combining posterior approximation procedures, linking it to recent research on how ensembles and variational Bayesian methods navigate the loss landscape.

### DEUP: Direct Epistemic Uncertainty Prediction

- Computer ScienceArXiv
- 2021

This work proposes a principled framework for directly estimating the excess risk by learning a secondary predictor for the generalization error and subtracting an estimate of aleatoric uncertainty, i.e., intrinsic unpredictability, which is particularly interesting in interactive learning environments.

### Variational- and metric-based deep latent space for out-of-distribution detection

- Computer ScienceUAI
- 2022

This work proposes a new latent space where: each known class is well captured by a nearly-isotropic Gaussian; 2) those Gaussians are far from each other and from the origin of the space (together, these properties effectively leave the area around the origin free for OOD data).

## References

SHOWING 1-10 OF 38 REFERENCES

### Adversarial Distillation of Bayesian Neural Network Posteriors

- Computer ScienceICML
- 2018

These are the first results applying MCMC-based BNNs to the aforementioned downstream applications, and by construction, the framework not only distills the Bayesian predictive distribution, but the posterior itself, which allows one to compute quantities such as the approximate model variance, which is useful in downstream tasks.

### Bayesian dark knowledge

- Computer ScienceNIPS
- 2015

This work describes a method for "distilling" a Monte Carlo approximation to the posterior predictive density into a more compact form, namely a single deep neural network.

### Bayesian Dark Knowledge

- Computer Science
- 2015

This work describes a method for “distilling” a Monte Carlo approximation to the posterior predictive density into a more compact form, namely a single deep neural network.

### Approximating the Predictive Distribution via Adversarially-Trained Hypernetworks

- Computer Science
- 2018

This work defines a weight posterior to uniformly allow weight realizations of a neural network that meet a chosen fidelity constraint and trains a combination of hypernetwork and main network via the GAN framework by sampling from this posterior predictive distribution.

### LEARNING SPARSE STRUCTURED ENSEMBLES WITH STOCHASTIC GRADIENT MCMC SAMPLING AND NETWORK PRUNING

- Computer Science2018 IEEE 28th International Workshop on Machine Learning for Signal Processing (MLSP)
- 2018

This work proposes the first methodology and empirical study of integrating SG-MCMC, group sparse prior and network pruning together for learning NN ensembles and achieves high prediction accuracy and reduces memory and computation cost in both training and testing.

### Bayesian Compression for Deep Learning

- Computer ScienceNIPS
- 2017

This work argues that the most principled and effective way to attack the problem of compression and computational efficiency in deep learning is by adopting a Bayesian point of view, where through sparsity inducing priors the authors prune large parts of the network.

### Bayesian Learning via Stochastic Gradient Langevin Dynamics

- Computer ScienceICML
- 2011

In this paper we propose a new framework for learning from large scale datasets based on iterative learning from small mini-batches. By adding the right amount of noise to a standard stochastic…

### Probabilistic Backpropagation for Scalable Learning of Bayesian Neural Networks

- Computer ScienceICML
- 2015

This work presents a novel scalable method for learning Bayesian neural networks, called probabilistic backpropagation (PBP), which works by computing a forward propagation of probabilities through the network and then doing a backward computation of gradients.

### Practical Variational Inference for Neural Networks

- Computer ScienceNIPS
- 2011

This paper introduces an easy-to-implement stochastic variational method (or equivalently, minimum description length loss function) that can be applied to most neural networks and revisits several common regularisers from a variational perspective.

### Generative Adversarial Nets

- Computer ScienceNIPS
- 2014

We propose a new framework for estimating generative models via an adversarial process, in which we simultaneously train two models: a generative model G that captures the data distribution, and a…