Limitations of Information-Theoretic Generalization Bounds for Gradient Descent Methods in Stochastic Convex Optimization

@inproceedings{Haghifam2022LimitationsOI,
  title={Limitations of Information-Theoretic Generalization Bounds for Gradient Descent Methods in Stochastic Convex Optimization},
  author={Mahdi Haghifam and Borja Rodr'iguez-G'alvez and Ragnar Thobaben and Mikael Skoglund and Daniel M. Roy and Gintare Karolina Dziugaite},
  booktitle={International Conference on Algorithmic Learning Theory},
  year={2022}
}
To date, no"information-theoretic"frameworks for reasoning about generalization error have been shown to establish minimax rates for gradient descent in the setting of stochastic convex optimization. In this work, we consider the prospect of establishing such rates via several existing information-theoretic frameworks: input-output mutual information bounds, conditional mutual information bounds and variants, PAC-Bayes bounds, and recent conditional variants thereof. We prove that none of these… 

Figures from this paper

Information Theoretic Lower Bounds for Information Theoretic Upper Bounds

Stochastic convex optimization reveals that, for true risk minimization, dimension-dependent mutual information is necessary and indicates that existing information-theoretic generalization bounds fall short in capturing the generalization capabilities of algorithms like SGD and regularized ERM, which have dimension-independent sample complexity.

Exactly Tight Information-Theoretic Generalization Error Bound for the Quadratic Gaussian Problem

It is shown that although the conditional bounding and the reference distribution can make the bound exactly tight, removing them does not significantly degrade the bound, which leads to a mutual-information-based bound that is also asymptotically tight in this setting.

Tighter Information-Theoretic Generalization Bounds from Supersamples

We present a variety of novel information-theoretic generalization bounds for learning algorithms, from the supersample setting of Steinke&Zakynthinou (2020)-the setting of the"conditional mutual

Select without Fear: Almost All Mini-Batch Schedules Generalize Optimally

For smooth (non-Lipschitz) nonconvex losses, it is shown that full-batch (deterministic) GD is essentially optimal, among all possible batch schedules within the considered class, including all stochastic ones.

Information-Theoretic Generalization Bounds for Stochastic Gradient Descent

This work combines the information-theoretic generalization bounds previously used for analyzing randomized variants of SGD with a perturbation analysis of the iterates to provide upper bounds on the generalization error.

On Generalization Error Bounds of Noisy Gradient Methods for Non-Convex Learning

A new framework, termed Bayes-Stability, is developed for proving algorithm-dependent generalization error bounds for learning general non-convex objectives and it is demonstrated that the data-dependent bounds can distinguish randomly labelled data from normal data.

Generalization Error Bounds for Noisy, Iterative Algorithms

In statistical learning theory, generalization error is used to quantify the degree to which a supervised machine learning algorithm may overfit to training data. Recent work [Xu and Raginsky (2017)]

Information-Theoretic Bayes Risk Lower Bounds for Realizable Models

We derive information-theoretic lower bounds on the Bayes risk and generalization error of realizable machine learning models. In particular, we employ an analysis in which the rate-distortion

Information-theoretic analysis of generalization capability of learning algorithms

We derive upper bounds on the generalization error of a learning algorithm in terms of the mutual information between its input and output. The bounds provide an information-theoretic understanding

An Exact Characterization of the Generalization Error for the Gibbs Algorithm

This work provides an exact characterization of the expected generalization error of the well-known Gibbs algorithm using symmetrized KL information between the input training samples and the output hypothesis and can be applied to tighten existing expectedgeneralization error and PAC-Bayesian bounds.

Reasoning About Generalization via Conditional Mutual Information

This work uses Conditional Mutual Information (CMI) to quantify how well the input can be recognized given the output of the learning algorithm, and shows that bounds on CMI can be obtained from VC dimension, compression schemes, differential privacy, and other methods.

Tightening Mutual Information Based Bounds on Generalization Error

Application to noisy and iterative algorithms, e.g., stochastic gradient Langevin dynamics (SGLD), is also studied, where the constructed bound provides a tighter characterization of the generalization error than existing results.

Formal limitations of sample-wise information-theoretic generalization bounds

It is shown that even for expected squared generalization gap no such sample-wise information-theoretic bounds exist for PAC-Bayes and single-draw bounds, and that PAC- Bayes, single- draw and expected squaredgeneralization gap bounds that depend on information in pairs of examples exist.

Upper Bounds on the Generalization Error of Private Algorithms for Discrete Data

This work develops a strategy using this formulation, based on the method of types and typicality, to find explicit upper bounds on the generalization error of stable algorithms, i.e., algorithms that produce similar output hypotheses given similar input datasets.
...