• Corpus ID: 53778174

The Deep Weight Prior

  title={The Deep Weight Prior},
  author={Andrei Atanov and Arsenii Ashukha and Kirill Struminsky and Dmitry P. Vetrov and Max Welling},
Bayesian inference is known to provide a general framework for incorporating prior knowledge or specific properties into machine learning models via carefully choosing a prior distribution. In this work, we propose a new type of prior distributions for convolutional neural networks, deep weight prior, that in contrast to previously published techniques, favors empirically estimated structure of convolutional filters e.g., spatial correlations of weights. We define deep weight prior as an… 

Figures and Tables from this paper

Priors in Bayesian Deep Learning: A Review
An overview of different priors that have been proposed for (deep) Gaussian processes, variational autoencoders, and Bayesian neural networks is presented and different methods of learning priors for these models from data are outlined.
All You Need is a Good Functional Prior for Bayesian Deep Learning
This work proposes a novel and robust framework to match their prior with the functional prior of neural networks based on the minimization of their Wasserstein distance, and provides vast experimental evidence that coupling these priors with scalable Markov chain Monte Carlo sampling offers systematically large performance improvements over alternative choices of priors and state-of-the-art approximate Bayesian deep learning approaches.
Specifying Weight Priors in Bayesian Deep Neural Networks with Empirical Bayes
This work proposes MOdel Priors with Empirical Bayes using DNN (MOPED) method to choose informed weight priors in Bayesian neural networks and demonstrates MOPED method enables scalable variational inference and provides reliable uncertainty quantification.
Collapsed Variational Bounds for Bayesian Neural Networks
The new bounds significantly improve the performance of Gaussian mean-field VI applied to BNNs on a variety of data sets, and are found that the tighter ELBOs can be good optimization targets for learning the hyperparameters of hierarchical priors.
MOPED: Efficient priors for scalable variational inference in Bayesian deep neural networks
The proposed Bayesian MOdel Priors Extracted from Deterministic DNN (MOPED) method for stochastic variational inference to choose meaningful prior distributions over weight space using deterministic weights derived from the pretrained DNNs of equivalent architecture achieves faster training convergence and provides reliable uncertainty quantification, without compromising on the accuracy provided by the deterministic Dnns.
Sparse Uncertainty Representation in Deep Learning with Inducing Weights
This work augments each weight matrix with a small inducing weight matrix, projecting the uncertainty quantification into a lower dimensional space, and extends Matheron’s conditional Gaussian sampling rule to enable fast weight sampling, which enables the inference method to maintain reasonable run-time as compared with ensembles.
Predictive Complexity Priors
P predictive complexity priors are proposed: a functional prior that is defined by comparing the model's predictions to those of a reference function via a change of variables, which is originally defined on the model outputs.
The Functional Neural Process
A new family of exchangeable stochastic processes, the Functional Neural Processes (FNPs), are presented and it is demonstrated that they are scalable to large datasets through mini-batch optimization and described how they can make predictions for new points via their posterior predictive distribution.
Importance Weighted Hierarchical Variational Inference
This work introduces a new family of variational upper bounds on a marginal log density in the case of hierarchical models (also known as latent variable models) and derives a family of increasingly tighter variational lower bounds on the otherwise intractable standard evidence lower bound for hierarchical variational distributions, enabling the use of more expressive approximate posteriors.
Improving MFVI in Bayesian Neural Networks with Empirical Bayes: a Study with Diabetic Retinopathy Diagnosis
MOPED method provides reliable uncertainty estimates while outperforming state-of-the-art methods, offering a new strong baseline for the BDL community to compare on complex real-world tasks involving larger models.


Learning Priors for Invariance
The proposed method is akin to posterior variational inference: it chooses a parametric family and optimize to find the member of the family that makes the model robust to a given transformation, and demonstrates the method’s utility for dropout and rotation transformations.
Variational Dropout Sparsifies Deep Neural Networks
Variational Dropout is extended to the case when dropout rates are unbounded, a way to reduce the variance of the gradient estimator is proposed and first experimental results with individual drop out rates per weight are reported.
Auto-Encoding Variational Bayes
A stochastic variational inference and learning algorithm that scales to large datasets and, under some mild differentiability conditions, even works in the intractable case is introduced.
Probabilistic Meta-Representations Of Neural Networks
This work considers a richer prior distribution in which units in the network are represented by latent variables, and the weights between units are drawn conditionally on the values of the collection of those variables.
Structured Bayesian Pruning via Log-Normal Multiplicative Noise
A new Bayesian model is proposed that takes into account the computational structure of neural networks and provides structured sparsity, e.g. removes neurons and/or convolutional channels in CNNs and provides significant acceleration on a number of deep neural architectures.
Bayesian Compression for Deep Learning
This work argues that the most principled and effective way to attack the problem of compression and computational efficiency in deep learning is by adopting a Bayesian point of view, where through sparsity inducing priors the authors prune large parts of the network.
Multiplicative Normalizing Flows for Variational Bayesian Neural Networks
We reinterpret multiplicative noise in neural networks as auxiliary random variables that augment the approximate posterior in a variational setting for Bayesian neural networks. We show that through
Bayesian Incremental Learning for Deep Neural Networks
This work focuses on a continuous learning setup where the task is always the same and new parts of data arrive sequentially and applies a Bayesian approach to update the posterior approximation with each new piece of data.
Bayesian Regularization and Pruning Using a Laplace Prior
Standard techniques for improved generalization from neural networks include weight decay and pruning and a comparison is made with results of MacKay using the evidence framework and a gaussian regularizer.
Stochastic Backpropagation and Approximate Inference in Deep Generative Models
We marry ideas from deep neural networks and approximate Bayesian inference to derive a generalised class of deep, directed generative models, endowed with a new algorithm for scalable inference and