• Corpus ID: 235353015

Out-of-Distribution Generalization in Kernel Regression

  title={Out-of-Distribution Generalization in Kernel Regression},
  author={Abdulkadir Canatar and Blake Bordelon and Cengiz Pehlevan},
In real word applications, the data generating process for training a machine learning model often differs from what the model encounters in the test stage. Understanding how and whether machine learning models generalize under such distributional shifts remains a theoretical challenge. Here, we study generalization in kernel regression when the training and test distributions are different using the replica method from statistical physics. We derive an analytical formula for the out-of… 

Figures from this paper

Dimensionality Reduction and Wasserstein Stability for Kernel Regression
A novel stability result of kernel regression with respect to the Wasserstein distance is derived, which allows us to bound errors that occur when perturbed input data is used to fit a kernel function.


Spectral bias and task-model alignment explain generalization in kernel regression and infinitely wide neural networks
This work investigates generalization error for kernel regression, and proposes a predictive theory of generalization in kernel regression applicable to real data, which explains various generalization phenomena observed in wide neural networks, which admit a kernel limit and generalize well despite being overparameterized.
Spectrum Dependent Learning Curves in Kernel Regression and Wide Neural Networks
A new spectral principle is identified: as the size of the training set grows, kernel machines and neural networks fit successively higher spectral modes of the target function.
To understand deep learning we need to understand kernel learning
It is argued that progress on understanding deep learning will be difficult until more tractable "shallow" kernel methods are better understood, and a need for new theoretical ideas for understanding properties of classical kernel methods.
Out of Distribution Generalization in Machine Learning
A central topic in the thesis is the strong link between discovering the causal structure of the data, finding features that are reliable (when using them to predict) regardless of their context, and out of distribution generalization.
Distributionally Robust Neural Networks for Group Shifts: On the Importance of Regularization for Worst-Case Generalization
The results suggest that regularization is important for worst-group generalization in the overparameterized regime, even if it is not needed for average generalization, and introduce a stochastic optimization algorithm, with convergence guarantees, to efficiently train group DRO models.
Out-of-Distribution Generalization via Risk Extrapolation (REx)
This work introduces the principle of Risk Extrapolation (REx), and shows conceptually how this principle enables extrapolation, and demonstrates the effectiveness and scalability of instantiations of REx on various OoD generalization tasks.
Dataset Shift in Machine Learning
This volume offers an overview of current efforts to deal with dataset and covariate shift, and places dataset shift in relationship to transfer learning, transduction, local learning, active learning, and semi-supervised learning.
Scalable Kernel Methods via Doubly Stochastic Gradients
An approach that scales up kernel methods using a novel concept called "doubly stochastic functional gradients" based on the fact that many kernel methods can be expressed as convex optimization problems, which can readily scale kernel methods up to the regimes which are dominated by neural nets.
Capturing the learning curves of generic features maps for realistic data sets with a teacher-student model
A rigorous formula is proved for the asymptotic training loss and generalisation error achieved by empirical risk minimization for the high-dimensional Gaussian covariate model used in teacher-student models.
A theory of learning from different domains
A classifier-induced divergence measure that can be estimated from finite, unlabeled samples from the domains and shows how to choose the optimal combination of source and target error as a function of the divergence, the sample sizes of both domains, and the complexity of the hypothesis class.