# Out-of-Distribution Generalization in Kernel Regression

@inproceedings{Canatar2021OutofDistributionGI, title={Out-of-Distribution Generalization in Kernel Regression}, author={Abdulkadir Canatar and Blake Bordelon and Cengiz Pehlevan}, booktitle={NeurIPS}, year={2021} }

In real word applications, the data generating process for training a machine learning model often differs from what the model encounters in the test stage. Understanding how and whether machine learning models generalize under such distributional shifts remains a theoretical challenge. Here, we study generalization in kernel regression when the training and test distributions are different using the replica method from statistical physics. We derive an analytical formula for the out-of…

## One Citation

Dimensionality Reduction and Wasserstein Stability for Kernel Regression

- Computer Science, MathematicsArXiv
- 2022

A novel stability result of kernel regression with respect to the Wasserstein distance is derived, which allows us to bound errors that occur when perturbed input data is used to fit a kernel function.

## References

SHOWING 1-10 OF 58 REFERENCES

To understand deep learning we need to understand kernel learning

- Computer ScienceICML
- 2018

It is argued that progress on understanding deep learning will be difficult until more tractable "shallow" kernel methods are better understood, and a need for new theoretical ideas for understanding properties of classical kernel methods.

Distributionally Robust Neural Networks for Group Shifts: On the Importance of Regularization for Worst-Case Generalization

- Computer ScienceArXiv
- 2019

The results suggest that regularization is important for worst-group generalization in the overparameterized regime, even if it is not needed for average generalization, and introduce a stochastic optimization algorithm, with convergence guarantees, to efficiently train group DRO models.

Dataset Shift in Machine Learning

- Computer Science
- 2009

This volume offers an overview of current efforts to deal with dataset and covariate shift, and places dataset shift in relationship to transfer learning, transduction, local learning, active learning, and semi-supervised learning.

Scalable Kernel Methods via Doubly Stochastic Gradients

- Computer ScienceNIPS
- 2014

An approach that scales up kernel methods using a novel concept called "doubly stochastic functional gradients" based on the fact that many kernel methods can be expressed as convex optimization problems, which can readily scale kernel methods up to the regimes which are dominated by neural nets.

Capturing the learning curves of generic features maps for realistic data sets with a teacher-student model

- Computer ScienceArXiv
- 2021

A rigorous formula is proved for the asymptotic training loss and generalisation error achieved by empirical risk minimization for the high-dimensional Gaussian covariate model used in teacher-student models.

A theory of learning from different domains

- Computer ScienceMachine Learning
- 2009

A classifier-induced divergence measure that can be estimated from finite, unlabeled samples from the domains and shows how to choose the optimal combination of source and target error as a function of the divergence, the sample sizes of both domains, and the complexity of the hypothesis class.

Statistical mechanics of learning from examples.

- Computer SciencePhysical review. A, Atomic, molecular, and optical physics
- 1992

It is shown that for smooth networks, i.e., those with continuously varying weights and smooth transfer functions, the generalization curve asymptotically obeys an inverse power law, while for nonsmooth networks other behaviors can appear, depending on the nature of the nonlinearities as well as the realizability of the rule.

Double Trouble in Double Descent : Bias and Variance(s) in the Lazy Regime

- Computer ScienceICML
- 2020

A quantitative theory for the double descent of test error in the so-called lazy learning regime of neural networks is developed by considering the problem of learning a high-dimensional function with random features regression, and it is shown that the bias displays a phase transition at the interpolation threshold, beyond which it remains constant.

Advances in Domain Adaptation Theory

- Computer Science
- 2019

This book provides an overview of the state-of-the-art theoretical results in a specific – and arguably the most popular – subfield of transfer learning called domain adaptation.

On the Inductive Bias of Neural Tangent Kernels

- Computer ScienceNeurIPS
- 2019

This work studies smoothness, approximation, and stability properties of functions with finite norm, including stability to image deformations in the case of convolutional networks, and compares to other known kernels for similar architectures.