• Corpus ID: 235353015

Out-of-Distribution Generalization in Kernel Regression

  title={Out-of-Distribution Generalization in Kernel Regression},
  author={Abdulkadir Canatar and Blake Bordelon and Cengiz Pehlevan},
In real word applications, the data generating process for training a machine learning model often differs from what the model encounters in the test stage. Understanding how and whether machine learning models generalize under such distributional shifts remains a theoretical challenge. Here, we study generalization in kernel regression when the training and test distributions are different using the replica method from statistical physics. We derive an analytical formula for the out-of… 

Figures from this paper

Dimensionality Reduction and Wasserstein Stability for Kernel Regression
A novel stability result of kernel regression with respect to the Wasserstein distance is derived, which allows us to bound errors that occur when perturbed input data is used to fit a kernel function.


To understand deep learning we need to understand kernel learning
It is argued that progress on understanding deep learning will be difficult until more tractable "shallow" kernel methods are better understood, and a need for new theoretical ideas for understanding properties of classical kernel methods.
Distributionally Robust Neural Networks for Group Shifts: On the Importance of Regularization for Worst-Case Generalization
The results suggest that regularization is important for worst-group generalization in the overparameterized regime, even if it is not needed for average generalization, and introduce a stochastic optimization algorithm, with convergence guarantees, to efficiently train group DRO models.
Dataset Shift in Machine Learning
This volume offers an overview of current efforts to deal with dataset and covariate shift, and places dataset shift in relationship to transfer learning, transduction, local learning, active learning, and semi-supervised learning.
Scalable Kernel Methods via Doubly Stochastic Gradients
An approach that scales up kernel methods using a novel concept called "doubly stochastic functional gradients" based on the fact that many kernel methods can be expressed as convex optimization problems, which can readily scale kernel methods up to the regimes which are dominated by neural nets.
Capturing the learning curves of generic features maps for realistic data sets with a teacher-student model
A rigorous formula is proved for the asymptotic training loss and generalisation error achieved by empirical risk minimization for the high-dimensional Gaussian covariate model used in teacher-student models.
A theory of learning from different domains
A classifier-induced divergence measure that can be estimated from finite, unlabeled samples from the domains and shows how to choose the optimal combination of source and target error as a function of the divergence, the sample sizes of both domains, and the complexity of the hypothesis class.
Statistical mechanics of learning from examples.
It is shown that for smooth networks, i.e., those with continuously varying weights and smooth transfer functions, the generalization curve asymptotically obeys an inverse power law, while for nonsmooth networks other behaviors can appear, depending on the nature of the nonlinearities as well as the realizability of the rule.
Double Trouble in Double Descent : Bias and Variance(s) in the Lazy Regime
A quantitative theory for the double descent of test error in the so-called lazy learning regime of neural networks is developed by considering the problem of learning a high-dimensional function with random features regression, and it is shown that the bias displays a phase transition at the interpolation threshold, beyond which it remains constant.
Advances in Domain Adaptation Theory
This book provides an overview of the state-of-the-art theoretical results in a specific – and arguably the most popular – subfield of transfer learning called domain adaptation.
On the Inductive Bias of Neural Tangent Kernels
This work studies smoothness, approximation, and stability properties of functions with finite norm, including stability to image deformations in the case of convolutional networks, and compares to other known kernels for similar architectures.