• Corpus ID: 246285638

Information-Theoretic Characterization of the Generalization Error for Iterative Semi-Supervised Learning

  title={Information-Theoretic Characterization of the Generalization Error for Iterative Semi-Supervised Learning},
  author={Haiyun He and Hanshu Yan and Vincent Y. F. Tan},
Using information-theoretic principles, we consider the generalization error (gen-error) of iterative semi-supervised learning (SSL) algorithms that iteratively generate pseudo-labels for a large amount of unlabelled data to progressively refine the model parameters. In contrast to most previous works that bound the gen-error, we provide an exact expression for the gen-error and particularize it to the binary Gaussian mixture model. Our theoretical results suggest that when the class… 


Characterizing and Understanding the Generalization Error of Transfer Learning with Gibbs Algorithm
This work provides an information-theoretic analysis of the generalization ability of Gibbs-based transfer learning algorithms by focusing on two popular empirical risk minimization approaches for transfer learning, α -weighted-ERM and two-stage- ERM, and characterizes thegeneralization errors and excess risks of these two Gibbs algorithms in the asymptotic regime.
Sharpened Generalization Bounds based on Conditional Mutual Information and an Application to Noisy, Iterative Algorithms
This work study the proposal to reason about the generalization error of a learning algorithm by introducing a super sample that contains the training sample as a random subset and computing mutual information conditional on the super sample, and introduces yet tighter bounds based on the conditional mutual information.
Information-Theoretic Generalization Bounds for SGLD via Data-Dependent Estimates
This work improves upon the stepwise analysis of noisy iterative learning algorithms and significantly improved mutual information bounds for Stochastic Gradient Langevin Dynamics via data-dependent estimates via variational characterization of mutual information.
Generalization Error Bounds for Noisy, Iterative Algorithms
In statistical learning theory, generalization error is used to quantify the degree to which a supervised machine learning algorithm may overfit to training data. Recent work [Xu and Raginsky (2017)]
An Exact Characterization of the Generalization Error for the Gibbs Algorithm
This work provides an exact characterization of the expected generalization error of the well-known Gibbs algorithm using symmetrized KL information between the input training samples and the output hypothesis and can be applied to tighten existing expectedgeneralization error and PAC-Bayesian bounds.
Information-theoretic analysis for transfer learning
The results suggest, perhaps as expected, that the Kullback-Leibler (KL) divergence D(μ║μ') plays an important role in characterizing the generalization error in the settings of domain adaptation.
Reasoning About Generalization via Conditional Mutual Information
This work uses Conditional Mutual Information (CMI) to quantify how well the input can be recognized given the output of the learning algorithm, and shows that bounds on CMI can be obtained from VC dimension, compression schemes, differential privacy, and other methods.
Unlabeled Data Improves Adversarial Robustness
It is proved that unlabeled data bridges the complexity gap between standard and robust classification: a simple semisupervised learning procedure (self-training) achieves high robust accuracy using the same number of labels required for achieving high standard accuracy.
Binary Classification of Gaussian Mixtures: Abundance of Support Vectors, Benign Overfitting, and Regularization
This paper examines binary linear classification under a generative Gaussian mixture model in which the feature vectors take the form x = ±η + q, and identifies conditions under which the interpolating estimator performs better than corresponding regularized estimates.
Tightening Mutual Information Based Bounds on Generalization Error
Application to noisy and iterative algorithms, e.g., stochastic gradient Langevin dynamics (SGLD), is also studied, where the constructed bound provides a tighter characterization of the generalization error than existing results.