Corpus ID: 235417295

Towards Understanding Generalization via Decomposing Excess Risk Dynamics

  title={Towards Understanding Generalization via Decomposing Excess Risk Dynamics},
  author={Jiaye Teng and Jianhao Ma and Yang Yuan},
Generalization is one of the critical issues in machine learning. However, traditional methods like uniform convergence are not powerful enough to fully explain generalization because they may yield vacuous bounds even in overparameterized linear regression regimes. An alternative solution is to analyze the generalization dynamics to derive algorithm-dependent bounds, e.g., stability. Unfortunately, the stability-based bound is still far from explaining the remarkable generalization ability of… Expand

Figures from this paper


On Generalization Error Bounds of Noisy Gradient Methods for Non-Convex Learning
A new framework, termed Bayes-Stability, is developed for proving algorithm-dependent generalization error bounds for learning general non-convex objectives and it is demonstrated that the data-dependent bounds can distinguish randomly labelled data from normal data. Expand
Understanding Double Descent Requires a Fine-Grained Bias-Variance Decomposition
This work describes an interpretable, symmetric decomposition of the variance into terms associated with the randomness from sampling, initialization, and the labels, and compute the high-dimensional asymptotic behavior of this decomposition for random feature kernel regression, and analyzes the strikingly rich phenomenology that arises. Expand
Generalization Bounds of SGLD for Non-convex Learning: Two Theoretical Viewpoints
This is the first algorithm-dependent result with reasonable dependence on aggregated step sizes for non-convex learning, and has important implications to statistical learning aspects of stochastic gradient methods in complicated models such as deep learning. Expand
Computing Nonvacuous Generalization Bounds for Deep (Stochastic) Neural Networks with Many More Parameters than Training Data
By optimizing the PAC-Bayes bound directly, Langford and Caruana (2001) are able to extend their approach and obtain nonvacuous generalization bounds for deep stochastic neural network classifiers with millions of parameters trained on only tens of thousands of examples. Expand
Uniform convergence may be unable to explain generalization in deep learning
Through numerous experiments, doubt is cast on the power of uniform convergence-based generalization bounds to provide a complete picture of why overparameterized deep networks generalize well. Expand
Train faster, generalize better: Stability of stochastic gradient descent
We show that parametric models trained by a stochastic gradient method (SGM) with few iterations have vanishing generalization error. We prove our results by arguing that SGM is algorithmicallyExpand
Stability of SGD: Tightness Analysis and Improved Bounds
It is shown that for general datasets, the existing analysis for convex and strongly-convex loss functions is tight, but it can be improved for non-conventus loss functions, and novel and improved data-dependent bounds are given. Expand
Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers
It is proved that overparameterized neural networks can learn some notable concept classes, including two and three-layer networks with fewer parameters and smooth activations, and SGD (stochastic gradient descent) or its variants in polynomial time using polynomially many samples. Expand
Stability and Generalization of Learning Algorithms that Converge to Global Optima
This work derives black-box stability results that only depend on the convergence of a learning algorithm and the geometry around the minimizers of the loss function that establish novel generalization bounds for learning algorithms that converge to global minima. Expand
Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks
This paper analyzes training and generalization for a simple 2-layer ReLU net with random initialization, and provides the following improvements over recent works: a tighter characterization of training speed, an explanation for why training a neuralNet with random labels leads to slower training, and a data-dependent complexity measure. Expand