• Corpus ID: 213491882

Generalized Bayesian Posterior Expectation Distillation for Deep Neural Networks

  title={Generalized Bayesian Posterior Expectation Distillation for Deep Neural Networks},
  author={Meet P. Vadera and Borhan Jalaeian and Benjamin M Marlin},
In this paper, we present a general framework for distilling expectations with respect to the Bayesian posterior distribution of a deep neural network classifier, extending prior work on the Bayesian Dark Knowledge framework. The proposed framework takes as input "teacher" and student model architectures and a general posterior expectation of interest. The distillation method performs an online compression of the selected posterior expectation using iteratively generated Monte Carlo samples. We… 

Post-hoc loss-calibration for Bayesian neural networks

Methods for correcting approximate posterior predictive distributions encouraging them to prefer high-utility decisions are developed, which is agnostic to the choice of the approximate inference algorithm, allows for efficient test time decision making through amortization, and empirically produces higher quality decisions.

Impact of Parameter Sparsity on Stochastic Gradient MCMC Methods for Bayesian Deep Learning

This paper uses stochastic gradient MCMC methods as the core Bayesian inference method and considers a variety of approaches for selecting sparse network structures, showing that certain classes of randomly selected substructure can perform as well as substructures derived from state-of-the-art iterative pruning methods while drastically reducing model training times.

Fast Predictive Uncertainty for Classification with Bayesian Deep Networks

It is argued that the resulting Dirichlet distribution has theoretical and practical advantages, in particular more efficient computation of the uncertainty estimate, scaling to large datasets and networks like ImageNet and DenseNet.

URSABench: Comprehensive Benchmarking of Approximate Bayesian Inference Methods for Deep Neural Networks

Initial work is described on the development ofURSABench, an open-source suite of bench-marking tools for comprehensive assessment of approximate Bayesian inference methods with a focus on deep learning-based classification tasks.

Self-Distribution Distillation: Efficient Uncertainty Estimation

This work proposes a novel training approach, self-distribution distillation (S2D), which is able to efficiently train a single model that can estimate uncertainties, and shows that even a standard deep ensemble can be outperformed using S2D based ensembles and novel distilled models.

Dense Uncertainty Estimation

It is claimed that conventional deterministic neural network based dense prediction tasks are prone to overfitting, leading to over-confident predictions, which is undesirable for decision making and introduced how uncertainty estimation can be used for deep model calibration to achieve well-calibrated models, namely dense model calibration.

Bayesian Federated Learning via Predictive Distribution Distillation

This work presents a framework for Bayesian federated learning where each client infers the posterior predictive distribution using its training data and presents various ways to aggregate these client-specific predictive distributions at the server.

Benchmarking Scalable Predictive Uncertainty in Text Classification

This paper empirically investigates why popular scalable uncertainty estimation strategies (Monte-Carlo Dropout, Deep Ensemble) and notable extensions (Heteroscedastic, Concrete Dropout) underestimate uncertainty, and finds that uncertainty estimation benefits from combining posterior approximation procedures, linking it to recent research on how ensembles and variational Bayesian methods navigate the loss landscape.

DEUP: Direct Epistemic Uncertainty Prediction

This work proposes a principled framework for directly estimating the excess risk by learning a secondary predictor for the generalization error and subtracting an estimate of aleatoric uncertainty, i.e., intrinsic unpredictability, which is particularly interesting in interactive learning environments.

Variational- and metric-based deep latent space for out-of-distribution detection

This work proposes a new latent space where: each known class is well captured by a nearly-isotropic Gaussian; 2) those Gaussians are far from each other and from the origin of the space (together, these properties effectively leave the area around the origin free for OOD data).



Adversarial Distillation of Bayesian Neural Network Posteriors

These are the first results applying MCMC-based BNNs to the aforementioned downstream applications, and by construction, the framework not only distills the Bayesian predictive distribution, but the posterior itself, which allows one to compute quantities such as the approximate model variance, which is useful in downstream tasks.

Bayesian dark knowledge

This work describes a method for "distilling" a Monte Carlo approximation to the posterior predictive density into a more compact form, namely a single deep neural network.

Bayesian Dark Knowledge

This work describes a method for “distilling” a Monte Carlo approximation to the posterior predictive density into a more compact form, namely a single deep neural network.

Approximating the Predictive Distribution via Adversarially-Trained Hypernetworks

This work defines a weight posterior to uniformly allow weight realizations of a neural network that meet a chosen fidelity constraint and trains a combination of hypernetwork and main network via the GAN framework by sampling from this posterior predictive distribution.


  • Yichi ZhangZhijian Ou
  • Computer Science
    2018 IEEE 28th International Workshop on Machine Learning for Signal Processing (MLSP)
  • 2018
This work proposes the first methodology and empirical study of integrating SG-MCMC, group sparse prior and network pruning together for learning NN ensembles and achieves high prediction accuracy and reduces memory and computation cost in both training and testing.

Bayesian Compression for Deep Learning

This work argues that the most principled and effective way to attack the problem of compression and computational efficiency in deep learning is by adopting a Bayesian point of view, where through sparsity inducing priors the authors prune large parts of the network.

Bayesian Learning via Stochastic Gradient Langevin Dynamics

In this paper we propose a new framework for learning from large scale datasets based on iterative learning from small mini-batches. By adding the right amount of noise to a standard stochastic

Probabilistic Backpropagation for Scalable Learning of Bayesian Neural Networks

This work presents a novel scalable method for learning Bayesian neural networks, called probabilistic backpropagation (PBP), which works by computing a forward propagation of probabilities through the network and then doing a backward computation of gradients.

Practical Variational Inference for Neural Networks

This paper introduces an easy-to-implement stochastic variational method (or equivalently, minimum description length loss function) that can be applied to most neural networks and revisits several common regularisers from a variational perspective.

Generative Adversarial Nets

We propose a new framework for estimating generative models via an adversarial process, in which we simultaneously train two models: a generative model G that captures the data distribution, and a