• Corpus ID: 216553442

Sharpened Generalization Bounds based on Conditional Mutual Information and an Application to Noisy, Iterative Algorithms

@article{Haghifam2020SharpenedGB,
  title={Sharpened Generalization Bounds based on Conditional Mutual Information and an Application to Noisy, Iterative Algorithms},
  author={Mahdi Haghifam and Jeffrey Negrea and Ashish Khisti and Daniel M. Roy and Gintare Karolina Dziugaite},
  journal={ArXiv},
  year={2020},
  volume={abs/2004.12983}
}
The information-theoretic framework of Russo and J. Zou (2016) and Xu and Raginsky (2017) provides bounds on the generalization error of a learning algorithm in terms of the mutual information between the algorithm's output and the training sample. In this work, we study the proposal, by Steinke and Zakynthinou (2020), to reason about the generalization error of a learning algorithm by introducing a super sample that contains the training sample as a random subset and computing mutual… 

Figures and Tables from this paper

Generalization Bounds via Information Density and Conditional Information Density
TLDR
This approach provides bounds on the average generalization error as well as bounds on its tail probability, for both the PAC-Bayesian and single-draw scenarios, and obtains novel bounds that depend on the information density between the training data and the output hypothesis.
Reasoning About Generalization via Conditional Mutual Information
TLDR
This work uses Conditional Mutual Information (CMI) to quantify how well the input can be recognized given the output of the learning algorithm, and shows that bounds on CMI can be obtained from VC dimension, compression schemes, differential privacy, and other methods.
Nonvacuous Loss Bounds with Fast Rates for Neural Networks via Conditional Information Measures
We present a framework to derive bounds on the test loss of randomized learning algorithms for the case of bounded loss functions. This framework leads to bounds that depend on the conditional
On Random Subset Generalization Error Bounds and the Stochastic Gradient Langevin Dynamics Algorithm
TLDR
This work unify several expected generalization error bounds based on random subsets using the framework developed by Hellström and Durisi and extends the bounds from Haghifam et al. to stochastic gradient Langevin dynamics and refine them for loss functions with potentially large gradient norms.
Tighter expected generalization error bounds via Wasserstein distance
TLDR
These results can be seen as a bridge between works that account for the geometry of the hypothesis space and those based on the relative entropy, which is agnostic to such geometry.
Stability Based Generalization Bounds for Exponential Family Langevin Dynamics
TLDR
This paper unify and substantially generalize stability based generalization bounds and introduces Exponential Family Langevin Dynamics (EFLD) which is a substantial generalization of SGLD and which allows exponential family noise to be used with stochastic gradient descent (SGD).
Individually Conditional Individual Mutual Information Bound on Generalization Error
TLDR
A new information-theoretic bound on generalization error is proposed based on a combination of the error decomposition technique of Bu et al. and the conditional mutual information (CMI) construction of Steinke and Zakynthinou, which overcomes the issue of conditioning terms in the conditional Mutual Information.
Conditioning and Processing: Techniques to Improve Information-Theoretic Generalization Bounds
TLDR
A probabilistic graphical representation of the information-theoretic bounds on generalization approach is adopted and two general techniques to improve the bounds are introduced, namely conditioning and processing.
Optimizing Information-theoretical Generalization Bounds via Anisotropic Noise in SGLD
TLDR
This paper optimizing the information-theoretical generalization bound by manipulating the noise structure in SGLD proves that with constraint to guarantee low empirical risk, the optimal noise covariance is the square root of the expected gradient covariance.
Information-theoretic generalization bounds for black-box learning algorithms
TLDR
Information-theoretic generalization bounds for supervised learning algorithms are derived based on the information contained in predictions rather than in the output of the training algorithm, which gives meaningful results for deterministic algorithms and is significantly easier to estimate.
...
...

References

SHOWING 1-10 OF 24 REFERENCES
Chaining Mutual Information and Tightening Generalization Bounds
TLDR
This paper introduces a technique to combine the chaining and mutual information methods, to obtain a generalization bound that is both algorithm-dependent and that exploits the dependencies between the hypotheses.
Information-Theoretic Generalization Bounds for SGLD via Data-Dependent Estimates
TLDR
This work improves upon the stepwise analysis of noisy iterative learning algorithms and significantly improved mutual information bounds for Stochastic Gradient Langevin Dynamics via data-dependent estimates via variational characterization of mutual information.
Tightening Mutual Information Based Bounds on Generalization Error
TLDR
Application to noisy and iterative algorithms, e.g., stochastic gradient Langevin dynamics (SGLD), is also studied, where the constructed bound provides a tighter characterization of the generalization error than existing results.
Generalization Error Bounds for Noisy, Iterative Algorithms
In statistical learning theory, generalization error is used to quantify the degree to which a supervised machine learning algorithm may overfit to training data. Recent work [Xu and Raginsky (2017)]
On Generalization Error Bounds of Noisy Gradient Methods for Non-Convex Learning
TLDR
A new framework, termed Bayes-Stability, is developed for proving algorithm-dependent generalization error bounds for learning general non-convex objectives and it is demonstrated that the data-dependent bounds can distinguish randomly labelled data from normal data.
Reasoning About Generalization via Conditional Mutual Information
TLDR
This work uses Conditional Mutual Information (CMI) to quantify how well the input can be recognized given the output of the learning algorithm, and shows that bounds on CMI can be obtained from VC dimension, compression schemes, differential privacy, and other methods.
Generalization error bounds using Wasserstein distances
TLDR
Upper bounds on the generalization error are derived in terms of a certain Wasserstein distance involving the distributions of the input and output of an algorithm under the assumption of a Lipschitz continuous loss function.
Information-theoretic analysis of generalization capability of learning algorithms
We derive upper bounds on the generalization error of a learning algorithm in terms of the mutual information between its input and output. The bounds provide an information-theoretic understanding
Learners that Use Little Information
TLDR
An approach that allows for upper bounds on the amount of information that algorithms reveal about their inputs is discussed, and a lower bound is provided by showing a simple concept class for which every empirical risk minimizer must reveal a lot of information is provided.
Information-theoretic analysis of stability and bias of learning algorithms
TLDR
This work proposes several information-theoretic measures of algorithmic stability and uses them to upper-bound the generalization bias of learning algorithms.
...
...