Corpus ID: 219965876

Differentiable PAC-Bayes Objectives with Partially Aggregated Neural Networks

@article{Biggs2020DifferentiablePO,
  title={Differentiable PAC-Bayes Objectives with Partially Aggregated Neural Networks},
  author={Felix Biggs and Benjamin Guedj},
  journal={ArXiv},
  year={2020},
  volume={abs/2006.12228}
}
We make three related contributions motivated by the challenge of training stochastic neural networks, particularly in a PAC-Bayesian setting: (1) we show how averaging over an ensemble of stochastic neural networks enables a new class of \emph{partially-aggregated} estimators; (2) we show that these lead to provably lower-variance gradient estimates for non-differentiable signed-output networks; (3) we reformulate a PAC-Bayesian bound for these networks to derive a directly optimisable… Expand
Tighter risk certificates for neural networks
TLDR
Observations suggest that the methods studied here might be good candidates for self-certified learning, in the sense of certifying the risk on any unseen data without the need for data-splitting protocols. Expand
Learning PAC-Bayes Priors for Probabilistic Neural Networks
Recent works have investigated deep learning models trained by optimising PAC-Bayes bounds, with priors that are learnt on subsets of the data. This combination has been shown to lead not only toExpand

References

SHOWING 1-10 OF 17 REFERENCES
Auto-Encoding Variational Bayes
TLDR
A stochastic variational inference and learning algorithm that scales to large datasets and, under some mild differentiability conditions, even works in the intractable case is introduced. Expand
Dichotomize and Generalize: PAC-Bayesian Binary Activated Deep Neural Networks
TLDR
This work develops an end-to-end framework to train a binary activated deep neural network with binary activation, and provides nonvacuous PAC-Bayesian generalization bounds for binaryactivated deep neural networks. Expand
Computing Nonvacuous Generalization Bounds for Deep (Stochastic) Neural Networks with Many More Parameters than Training Data
TLDR
By optimizing the PAC-Bayes bound directly, Langford and Caruana (2001) are able to extend their approach and obtain nonvacuous generalization bounds for deep stochastic neural network classifiers with millions of parameters trained on only tens of thousands of examples. Expand
Non-vacuous Generalization Bounds at the ImageNet Scale: a PAC-Bayesian Compression Approach
TLDR
This paper provides the first non-vacuous generalization guarantees for realistic architectures applied to the ImageNet classification problem and establishes an absolute limit on expected compressibility as a function of expected generalization error. Expand
Weight Uncertainty in Neural Network
TLDR
This work introduces a new, efficient, principled and backpropagation-compatible algorithm for learning a probability distribution on the weights of a neural network, called Bayes by Backprop, and shows how the learnt uncertainty in the weights can be used to improve generalisation in non-linear regression problems. Expand
Generalized Variational Inference: Three arguments for deriving new Posteriors
TLDR
An optimization-centric view on and a novel generalization of Bayesian inference is introduced, called the Rule of Three (RoT), which derives it axiomatically and recover existing posteriors as special cases, including the Bayesian posterior and its approximation by standard VI. Expand
PAC-Bayesian Theory Meets Bayesian Inference
TLDR
For the negative log-likelihood loss function, it is shown that the minimization of PAC-Bayesian generalization risk bounds maximizes the Bayesian marginal likelihood. Expand
Monte Carlo Gradient Estimation in Machine Learning
TLDR
A broad and accessible survey of the methods for Monte Carlo gradient estimation in machine learning and across the statistical sciences, exploring three strategies--the pathwise, score function, and measure-valued gradient estimators--exploring their historical developments, derivation, and underlying assumptions. Expand
Adam: A Method for Stochastic Optimization
TLDR
This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework. Expand
Variational Dropout and the Local Reparameterization Trick
TLDR
The Variational dropout method is proposed, a generalization of Gaussian dropout, but with a more flexibly parameterized posterior, often leading to better generalization in stochastic gradient variational Bayes. Expand
...
1
2
...