Dimensionality reduction, regularization, and generalization in overparameterized regressions

@article{Huang2020DimensionalityRR,
  title={Dimensionality reduction, regularization, and generalization in overparameterized regressions},
  author={Ningyuan Teresa Huang and David W. Hogg and Soledad Villar},
  journal={SIAM J. Math. Data Sci.},
  year={2020},
  volume={4},
  pages={126-152}
}
Overparameterization in deep learning is powerful: Very large models fit the training data perfectly and yet generalize well. This realization brought back the study of linear models for regression, including ordinary least squares (OLS), which, like deep learning, shows a "double descent" behavior. This involves two features: (1) The risk (out-of-sample prediction error) can grow arbitrarily when the number of samples $n$ approaches the number of parameters $p$, and (2) the risk decreases with… 

Characterizing the Spectrum of the NTK via a Power Series Expansion

Under mild conditions on the network initialization, a power series expansion for the Neural Tangent Kernel of arbitrarily deep feedforward networks in the infinite width limit is derived and an asymptotic upper bound on the spectrum of the NTK is derived.

When do Models Generalize? A Perspective from Data-Algorithm Compatibility

This work theoretically studies compatibility under the setting of solving overparameterized linear regression with gradient descent, and demonstrates that in the sense of compatibility, generalization holds with significantly weaker restrictions on the problem instance than the previous last iterate analysis.

Entrywise Recovery Guarantees for Sparse PCA via Sparsistent Algorithms

This paper provides entrywise ℓ 2, ∞ bounds for Sparse PCA under a general high-dimensional subgaussian design and shows that these bounds hold for any algorithm that selects the correct support with high probability, those that are sparsistent.

Support vector machines and linear regression coincide with very high-dimensional features

A super-linear lower bound on the dimension (in terms of sample size) required for support vector proliferation in independent feature models is proved, matching the upper bounds from previous works.

Fitting Very Flexible Models: Linear Regression With Large Numbers of Parameters

  • D. HoggSoledad Villar
  • Computer Science, Mathematics
    Publications of the Astronomical Society of the Pacific
  • 2021
Cross-validation and jackknife resampling are recommended as a good empirical method for model selection (for example, setting the number of parameters and the form of the regularization), and Jackknife Resampling for estimating the uncertainties of the predictions made by the model.

References

SHOWING 1-10 OF 82 REFERENCES

Generalization of Two-layer Neural Networks: An Asymptotic Viewpoint

The exact population risk of the unregularized least squares regression problem with two-layer neural networks when either the first or the second layer is trained using a gradient flow under different initialization setups is derived.

Precise Tradeoffs in Adversarial Training for Linear Regression

A precise and comprehensive understanding of the role of adversarial training in the context of linear regression with Gaussian features is provided and the fundamental tradeoff between the accuracies achievable by any algorithm regardless of computational power or size of the training data is characterized.

Benign overfitting in linear regression

A characterization of linear regression problems for which the minimum norm interpolating prediction rule has near-optimal prediction accuracy shows that overparameterization is essential for benign overfitting in this setting: the number of directions in parameter space that are unimportant for prediction must significantly exceed the sample size.

Surprises in High-Dimensional Ridgeless Least Squares Interpolation

This paper recovers-in a precise quantitative way-several phenomena that have been observed in large-scale neural networks and kernel machines, including the "double descent" behavior of the prediction risk, and the potential benefits of overparametrization.

and R

  • J. Tibshirani, Surprises in high-dimensional ridgeless least squares interpolation
  • 2020

and H

  • Hassani, Precise tradeoffs in adversarial training for linear regression
  • 2020

What causes the test error? Going beyond bias-variance via ANOVA

Using the analysis of variance (ANOVA) to decompose the variance in the test error in a symmetric way, for studying the generalization performance of certain two-layer linear and non-linear networks and advanced deterministic equivalent techniques for Haar random matrices are proposed.

How Close Are the Eigenvectors of the Sample and Actual Covariance Matrices?

It is proved that the inner product between eigenvectors of the sample and actual covariance matrices decreases proportionally to the respective eigenvalue distance and the number of samples.

Fitting Very Flexible Models: Linear Regression With Large Numbers of Parameters

  • D. HoggSoledad Villar
  • Computer Science, Mathematics
    Publications of the Astronomical Society of the Pacific
  • 2021
Cross-validation and jackknife resampling are recommended as a good empirical method for model selection (for example, setting the number of parameters and the form of the regularization), and Jackknife Resampling for estimating the uncertainties of the predictions made by the model.

Kernel regression in high dimension: Refined analysis beyond double descent

This refined analysis goes beyond the double descent theory by showing that, depending on the data eigen-profile and the level of regularization, the kernel regression risk curve can be a double-descent-like, bell-shaped, or monotonic function of $n$.
...