# Differentiable PAC-Bayes Objectives with Partially Aggregated Neural Networks

@article{Biggs2020DifferentiablePO, title={Differentiable PAC-Bayes Objectives with Partially Aggregated Neural Networks}, author={Felix Biggs and Benjamin Guedj}, journal={ArXiv}, year={2020}, volume={abs/2006.12228} }

We make three related contributions motivated by the challenge of training stochastic neural networks, particularly in a PAC-Bayesian setting: (1) we show how averaging over an ensemble of stochastic neural networks enables a new class of \emph{partially-aggregated} estimators; (2) we show that these lead to provably lower-variance gradient estimates for non-differentiable signed-output networks; (3) we reformulate a PAC-Bayesian bound for these networks to derive a directly optimisable… Expand

#### 2 Citations

Tighter risk certificates for neural networks

- Computer Science, Mathematics
- ArXiv
- 2020

Observations suggest that the methods studied here might be good candidates for self-certified learning, in the sense of certifying the risk on any unseen data without the need for data-splitting protocols. Expand

Learning PAC-Bayes Priors for Probabilistic Neural Networks

- Computer Science
- 2021

Recent works have investigated deep learning models trained by optimising PAC-Bayes bounds, with priors that are learnt on subsets of the data. This combination has been shown to lead not only to… Expand

#### References

SHOWING 1-10 OF 17 REFERENCES

Auto-Encoding Variational Bayes

- Mathematics, Computer Science
- ICLR
- 2014

A stochastic variational inference and learning algorithm that scales to large datasets and, under some mild differentiability conditions, even works in the intractable case is introduced. Expand

Dichotomize and Generalize: PAC-Bayesian Binary Activated Deep Neural Networks

- Computer Science, Mathematics
- NeurIPS
- 2019

This work develops an end-to-end framework to train a binary activated deep neural network with binary activation, and provides nonvacuous PAC-Bayesian generalization bounds for binaryactivated deep neural networks. Expand

Computing Nonvacuous Generalization Bounds for Deep (Stochastic) Neural Networks with Many More Parameters than Training Data

- Computer Science, Mathematics
- UAI
- 2017

By optimizing the PAC-Bayes bound directly, Langford and Caruana (2001) are able to extend their approach and obtain nonvacuous generalization bounds for deep stochastic neural network classifiers with millions of parameters trained on only tens of thousands of examples. Expand

Non-vacuous Generalization Bounds at the ImageNet Scale: a PAC-Bayesian Compression Approach

- Computer Science, Mathematics
- ICLR
- 2019

This paper provides the first non-vacuous generalization guarantees for realistic architectures applied to the ImageNet classification problem and establishes an absolute limit on expected compressibility as a function of expected generalization error. Expand

Weight Uncertainty in Neural Network

- Mathematics, Computer Science
- ICML
- 2015

This work introduces a new, efficient, principled and backpropagation-compatible algorithm for learning a probability distribution on the weights of a neural network, called Bayes by Backprop, and shows how the learnt uncertainty in the weights can be used to improve generalisation in non-linear regression problems. Expand

Generalized Variational Inference: Three arguments for deriving new Posteriors

- Computer Science
- 2019

An optimization-centric view on and a novel generalization of Bayesian inference is introduced, called the Rule of Three (RoT), which derives it axiomatically and recover existing posteriors as special cases, including the Bayesian posterior and its approximation by standard VI. Expand

PAC-Bayesian Theory Meets Bayesian Inference

- Computer Science, Mathematics
- NIPS
- 2016

For the negative log-likelihood loss function, it is shown that the minimization of PAC-Bayesian generalization risk bounds maximizes the Bayesian marginal likelihood. Expand

Monte Carlo Gradient Estimation in Machine Learning

- Computer Science, Mathematics
- J. Mach. Learn. Res.
- 2020

A broad and accessible survey of the methods for Monte Carlo gradient estimation in machine learning and across the statistical sciences, exploring three strategies--the pathwise, score function, and measure-valued gradient estimators--exploring their historical developments, derivation, and underlying assumptions. Expand

Adam: A Method for Stochastic Optimization

- Computer Science, Mathematics
- ICLR
- 2015

This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework. Expand

Variational Dropout and the Local Reparameterization Trick

- Computer Science, Mathematics
- NIPS
- 2015

The Variational dropout method is proposed, a generalization of Gaussian dropout, but with a more flexibly parameterized posterior, often leading to better generalization in stochastic gradient variational Bayes. Expand