Faster Kernel Ridge Regression Using Sketching and Preconditioning

  title={Faster Kernel Ridge Regression Using Sketching and Preconditioning},
  author={Haim Avron and Kenneth L. Clarkson and David P. Woodruff},
  journal={SIAM J. Matrix Anal. Appl.},
Kernel ridge regression is a simple yet powerful technique for nonparametric regression whose computation amounts to solving a linear system. This system is usually dense and highly ill-conditioned. In addition, the dimensions of the matrix are the same as the number of data points, so direct methods are unrealistic for large-scale datasets. In this paper, we propose a preconditioning technique for accelerating the solution of the aforementioned linear system. The preconditioner is based on… 

Figures and Tables from this paper

Fast and Accurate Gaussian Kernel Ridge Regression Using Matrix Decompositions for Preconditioning

The suggested approach is based on randomized matrix decomposition methods, combined with the fast multipole method to achieve an algorithm that can process large datasets in complexity linear to the number of data points.

Sharper Bounds for Regression and Low-Rank Approximation with Regularization

Sketching methods for regularized variants of linear regression, low rank approximations, and canonical correlation analysis are studied, both in a fairly broad setting, and in the specific context of the popular and widely used technique of ridge regularization.

An Iterative, Sketching-based Framework for Ridge Regression

It is proved that accurate approximations can be achieved by a sample whose size depends on the degrees of freedom of the ridge-regression problem rather than the dimensions of the design matrix, which is a fundamental and wellunderstood primitive of randomized linear algebra.

Near Input Sparsity Time Kernel Embeddings via Adaptive Sampling

A near input sparsity time algorithm for sampling the high-dimensional feature space implicitly defined by a kernel transformation, and shows how its subspace embedding bounds imply new statistical guarantees for kernel ridge regression.

Random Fourier Features for Kernel Ridge Regression: Approximation Bounds and Statistical Guarantees

The results are twofold: on the one hand, it is shown that random Fourier feature approximation can provably speed up kernel ridge regression under reasonable assumptions, and on the other hand, the method is suboptimal, and sampling from a modified distribution in Fourier space, given by the leverage function of the kernel, yields provably better performance.

Learning in High-Dimensional Feature Spaces Using ANOVA-Based Fast Matrix-Vector Multiplication

This work proposes the use of an ANOVA kernel, where several kernels are constructed based on lower-dimensional feature spaces for which the non-equispaced fast Fourier transform (NFFT) is employed, which is of linear complexity for fixed accuracy.

Scaling up Kernel Ridge Regression via Locality Sensitive Hashing

A simple weighted version of random binning features is introduced and it is shown that the corresponding kernel function generates Gaussian processes of any desired smoothness, leading to efficient algorithms for kernel ridge regression.

Training very large scale nonlinear SVMs using Alternating Direction Method of Multipliers coupled with the Hierarchically Semi-Separable kernel approximations

The detailed analysis of the interaction among their algorithmic components unveils a particularly efficient framework and indeed, the presented experimental results demonstrate a significant speed-up when compared to the state-of-the-art nonlinear SVM libraries (without significantly affecting the classification accuracy).

Scalable and Memory-Efficient Kernel Ridge Regression

A scalable and memory-efficient framework for kernel ridge regression that relies on a hierarchy of low-rank factorizations of tunable accuracy and provides sufficient accuracy in comparison with state-of-the-art methods and with the exact (i.e. non-approximated)kernel ridge regression method.

Oblivious Sketching of High-Degree Polynomial Kernels

This work is a general method for applying sketching solutions developed in numerical linear algebra over the past decade to a tensoring of data points without forming the tensoring explicitly, and leads to the first oblivious sketch for the polynomial kernel with a target dimension that is only polynomially dependent on the degree of the kernel function.



Fast Randomized Kernel Ridge Regression with Statistical Guarantees

A version of this approach that comes with running time guarantees as well as improved guarantees on its statistical performance is described, and a fast algorithm is presented to quickly compute coarse approximations to these scores in time linear in the number of samples.

Sharp analysis of low-rank kernel matrix approximations

This paper shows that in the context of kernel ridge regression, for approximations based on a random subset of columns of the original kernel matrix, the rank p may be chosen to be linear in the degrees of freedom associated with the problem, a quantity which is classically used in the statistical analysis of such methods.

Sharper Bounds for Regression and Low-Rank Approximation with Regularization

Sketching methods for regularized variants of linear regression, low rank approximations, and canonical correlation analysis are studied, both in a fairly broad setting, and in the specific context of the popular and widely used technique of ridge regularization.

Scalable Kernel Methods via Doubly Stochastic Gradients

An approach that scales up kernel methods using a novel concept called "doubly stochastic functional gradients" based on the fact that many kernel methods can be expressed as convex optimization problems, which can readily scale kernel methods up to the regimes which are dominated by neural nets.

Subspace Embeddings for the Polynomial Kernel

This work proposes the first fast oblivious subspace embeddings that are able to embed a space induced by a non-linear kernel without explicitly mapping the data to the high-dimensional space.

Randomized sketches for kernels: Fast and optimal non-parametric regression

It is proved that it suffices to choose the sketch dimension $m$ proportional to the statistical dimension (modulo logarithmic factors) of the kernel matrix, and fast and minimax optimal approximations to the KRR estimate for non-parametric regression are obtained.

Preconditioned Krylov solvers for kernel regression

A novel flexible preconditioner that not only improves convergence but also allows utilization of fast kernel matrix-vector products is introduced.

Preconditioning Kernel Matrices

A scalable approach to both solving kernel machines and learning their hyperparameters is described, and it is shown this approach is exact in the limit of iterations and outperforms state-of-the-art approximations for a given computational budget.

Optimal learning rates for Kernel Conjugate Gradient regression

We prove rates of convergence in the statistical sense for kernel-based least squares regression using a conjugate gradient algorithm, where regularization against overfitting is obtained by early

Sharper Bounds for Regularized Data Fitting

This work studies matrix sketching methods for regularized variants of linear regression, low rank approximation, and canonical correlation analysis by obtaining sketching-based algorithms for the low-rank approximation problem.