• Corpus ID: 3595524

Entropy-SGD optimizes the prior of a PAC-Bayes bound: Data-dependent PAC-Bayes priors via differential privacy

@inproceedings{Dziugaite2018EntropySGDOT,
  title={Entropy-SGD optimizes the prior of a PAC-Bayes bound: Data-dependent PAC-Bayes priors via differential privacy},
  author={Gintare Karolina Dziugaite and Daniel M. Roy},
  booktitle={NeurIPS},
  year={2018}
}
We show that Entropy-SGD (Chaudhari et al., 2017), when viewed as a learning algorithm, optimizes a PAC-Bayes bound on the risk of a Gibbs (posterior) classifier, i.e., a randomized classifier obtained by a risk-sensitive perturbation of the weights of a learned classifier. Entropy-SGD works by optimizing the bound’s prior, violating the hypothesis of the PAC-Bayes theorem that the prior is chosen independently of the data. Indeed, available implementations of Entropy-SGD rapidly obtain zero… 

Figures from this paper

Entropy-SGD optimizes the prior of a PAC-Bayes bound: Generalization properties of Entropy-SGD and data-dependent priors
We show that Entropy-SGD (Chaudhari et al., 2017), when viewed as a learning algorithm, optimizes a PAC-Bayes bound on the risk of a Gibbs (posterior) classifier, i.e., a randomized classifier
On the role of data in PAC-Bayes bounds
TLDR
This work shows that the bound based on the oracle prior can be suboptimal, and applies this new principle in the setting of nonconvex learning, simulating data-dependent oracle priors on MNIST and Fashion MNIST with and without held-out data, and demonstrating new nonvacuous bounds in both cases.
PAC-Bayes Analysis with Stochastic Kernels
TLDR
A general form of the PAC-Bayes inequality is presented, from which one may derive extensions of various known PAC- Bayes bounds as well as novel bounds, for stochastic learning models where the learner observes a finite set of training examples.
PAC-Bayes Analysis Beyond the Usual Bounds
TLDR
A basic PAC-Bayes inequality for stochastic kernels is presented, from which one may derive extensions of various known PAC- Bayes bounds as well as novel bounds, and a simple bound for a loss function with unbounded range is presented.
Fast-rate PAC-Bayes Generalization Bounds via Shifted Rademacher Processes
TLDR
A new framework is established for deriving fast-rate PAC-Bayes bounds in terms of the "flatness" of the empirical risk surface on which the posterior concentrates and yields new insights on PAC- Bayesian theory.
Revisiting generalization for deep learning: PAC-Bayes, flat minima, and generative models
TLDR
This work proposes to use a two sample test statistic for training neural network generator models and bound the gap between the population and the empirical estimate of the statistic, and forms nonvacuous generalization bounds for stochastic classifiers based on SGD solutions.
User-friendly introduction to PAC-Bayes bounds
TLDR
This paper describes a simplified version of the localization technique of [34, 36] that was missed by the community, and later rediscovered as “mutual information bounds” and is an attempt to provide an elementary introduction to PAC-Bayes theory.
Learning PAC-Bayes Priors for Probabilistic Neural Networks
TLDR
This work asks what is the optimal amount of data which should be allocated for building the prior and shows that the optimum may be dataset dependent, and demonstrates that using a small percentage of the prior-building data for validation of thePrior leads to promising results.
PAC-Bayes with Backprop
TLDR
It is suggested that neural nets trained by PBB may lead to self-bounding learning, where the available data can be used to simultaneously learn a predictor and certify its risk, with no need to follow a data-splitting protocol.
PAC-Bayes bounds for stable algorithms with instance-dependent priors
TLDR
This paper estimates the risk of the randomized algorithm in terms of the hypothesis stability coefficients and provides a new bound for the SVM classifier, which appears to be the first stability-based bound that evaluates to non-trivial values.
...
...

References

SHOWING 1-10 OF 61 REFERENCES
A PAC-Bayesian Tutorial with A Dropout Bound
TLDR
The training-variance bound dominates the other bounds but is more difficult to interpret and seems to suggest variance reduction methods such as bagging and may ultimately provide a more meaningful analysis of dropouts.
PAC-Bayes bounds for stable algorithms with instance-dependent priors
TLDR
This paper estimates the risk of the randomized algorithm in terms of the hypothesis stability coefficients and provides a new bound for the SVM classifier, which appears to be the first stability-based bound that evaluates to non-trivial values.
A PAC-Bayesian Analysis of Randomized Learning with Application to Stochastic Gradient Descent
TLDR
This work inspires an adaptive sampling algorithm for SGD that optimizes the posterior at runtime, and demonstrates that adaptive sampling can reduce empirical risk faster than uniform sampling while also improving out-of-sample accuracy.
Differential privacy and generalization: Sharper bounds with applications
PAC-bayes bounds with data dependent priors
TLDR
The experimental work illustrates that the new bounds can be significantly tighter than the original PAC-Bayes bound when applied to SVMs, and among them the combination of the prior PAC- Bayes bound and the prior SVM algorithm gives the tightest bound.
Simpler PAC-Bayesian bounds for hostile data
TLDR
This paper provides PAC-Bayesian learning bounds that hold for dependent, heavy-tailed observations (hereafter referred to as hostile data) and proves a general PAC- Bayesian bound, and shows how to use it in various hostile settings.
A Bayesian Perspective on Generalization and Stochastic Gradient Descent
TLDR
It is proposed that the noise introduced by small mini-batches drives the parameters towards minima whose evidence is large, and it is demonstrated that, when one holds the learning rate fixed, there is an optimum batch size which maximizes the test set accuracy.
Private Convex Empirical Risk Minimization and High-dimensional Regression
TLDR
This work significantly extends the analysis of the “objective perturbation” algorithm of Chaudhuri et al. (2011) for convex ERM problems, and gives the best known algorithms for differentially private linear regression.
A Tight Excess Risk Bound via a Unified PAC-Bayesian-Rademacher-Shtarkov-MDL Complexity
TLDR
These results recover optimal bounds for VC- and large (polynomial entropy) classes, replacing localized Rademacher complexity by a simpler analysis which almost completely separates the two aspects that determine the achievable rates: 'easiness' (Bernstein) conditions and model complexity.
PAC-BAYESIAN SUPERVISED CLASSIFICATION: The Thermodynamics of Statistical Learning
TLDR
An alternative selection scheme based on relative bounds between estimators is described and study, and a two step localization technique which can handle the selection of a parametric model from a family of those is presented.
...
...