Bandwidth Selection for Gaussian Kernel Ridge Regression via Jacobian Control

  title={Bandwidth Selection for Gaussian Kernel Ridge Regression via Jacobian Control},
  author={Oskar Allerbo and Rebecka J{\"o}rnsten},
Most machine learning methods depend on the tuning of hyper-parameters. For kernel ridge regression (KRR) with the Gaussian kernel, the hyper-parameter is the bandwidth. The bandwidth specifies the length-scale of the kernel and has to be carefully selected in order to obtain a model with good generalization. The default method for bandwidth selection is cross-validation, which often yields good results, albeit at high computational costs. Furthermore, the estimates provided by cross-validation… 

Figures from this paper


Data‐Driven Bandwidth Selection in Local Polynomial Fitting: Variable Bandwidth and Spatial Adaptation
When estimating a mean regression function and its derivatives, locally weighted least squares regression has proven to be a very attractive technique. The present paper focuses on the important
Spectrally-truncated kernel ridge regression and its free lunch
  • A. Amini
  • Computer Science, Mathematics
    Electronic Journal of Statistics
  • 2021
It is shown that, as long as the RKHS is infinite-dimensional, there is a threshold on r, above which, the spectrally-truncated KRR, surprisingly, outperforms the full KRR in terms of the minimax risk, where the minimum is taken over the regularization parameter.
When do neural networks outperform kernel methods?
It is shown that this curse of dimensionality becomes milder if the covariates display the same low-dimensional structure as the target function, and a spiked covariates model is presented that can capture in a unified framework both behaviors observed in earlier work.
A reliable data-based bandwidth selection method for kernel density estimation
The key to the success of the current procedure is the reintroduction of a non- stochastic term which was previously omitted together with use of the bandwidth to reduce bias in estimation without inflating variance.
Comparison of data-driven bandwith selectors
This article compares several promising data-driven methods for selecting the bandwidth of a kernel density estimator and believes the plug-in rule is the best of those currently available, but there is still room for improvement.
On the Similarity between the Laplace and Neural Tangent Kernels
It is shown that NTK for fully connected networks is closely related to the standard Laplace kernel, and theoretically that for normalized data on the hypersphere both kernels have the same eigenfunctions and their eigenvalues decay polynomially at the same rate, implying that their Reproducing Kernel Hilbert Spaces (RKHS) include the same sets of functions.
A Review and Comparison of Bandwidth Selection Methods for Kernel Regression
Given the need of automatic data‐driven bandwidth selectors for applied statistics, this review is intended to explain and, above all, compare these methods for selecting the bandwidth.
To understand deep learning we need to understand kernel learning
It is argued that progress on understanding deep learning will be difficult until more tractable "shallow" kernel methods are better understood, and a need for new theoretical ideas for understanding properties of classical kernel methods.
Surprises in High-Dimensional Ridgeless Least Squares Interpolation
This paper recovers---in a precise quantitative way---several phenomena that have been observed in large-scale neural networks and kernel machines, including the "double descent" behavior of the prediction risk, and the potential benefits of overparametrization.
Locally Weighted Regression: An Approach to Regression Analysis by Local Fitting
Abstract Locally weighted regression, or loess, is a way of estimating a regression surface through a multivariate smoothing procedure, fitting a function of the independent variables locally and in