# Subspace Embeddings for the Polynomial Kernel

@inproceedings{Avron2014SubspaceEF, title={Subspace Embeddings for the Polynomial Kernel}, author={Haim Avron and Huy L. Nguyen and David P. Woodruff}, booktitle={NIPS}, year={2014} }

Sketching is a powerful dimensionality reduction tool for accelerating statistical learning algorithms. However, its applicability has been limited to a certain extent since the crucial ingredient, the so-called oblivious subspace embedding, can only be applied to data spaces with an explicit representation as the column span or row span of a matrix, while in many settings learning is done in a high-dimensional space implicitly defined by the data matrix via a kernel transformation. We propose…

## 81 Citations

Near Input Sparsity Time Kernel Embeddings via Adaptive Sampling

- Computer ScienceICML
- 2020

A near input sparsity time algorithm for sampling the high-dimensional feature space implicitly defined by a kernel transformation, and shows how its subspace embedding bounds imply new statistical guarantees for kernel ridge regression.

Oblivious Sketching of High-Degree Polynomial Kernels

- Computer Science, MathematicsSODA
- 2020

This work is a general method for applying sketching solutions developed in numerical linear algebra over the past decade to a tensoring of data points without forming the tensoring explicitly, and leads to the first oblivious sketch for the polynomial kernel with a target dimension that is only polynomially dependent on the degree of the kernel function.

Relative Error RKHS Embeddings for Gaussian Kernels

- Computer ScienceArXiv
- 2018

The main insight is to effectively modify the well-traveled random Fourier features to be slightly biased and have higher variance, but so they can be defined as a convolution over the function space.

Faster Kernel Ridge Regression Using Sketching and Preconditioning

- Computer ScienceSIAM J. Matrix Anal. Appl.
- 2017

This paper proposes a preconditioning technique based on random feature maps, such as random Fourier features, which have recently emerged as a powerful technique for speeding up and scaling the training of kernel-based methods by resorting to approximations.

In-Database Regression in Input Sparsity Time

- Computer ScienceICML
- 2021

This work design subspace embeddings for database joins which can be computed significantly faster than computing the join, and extends the results to arbitrary joins for the ridge regression problem, significantly improving upon the running time of prior FAQ-based methods for regression.

N ov 2 01 8 Relative Error RKHS Embeddings for Gaussian Kernels

- Computer Science
- 2018

The main insight is to effectively modify the well-traveled random Fourier features to be slightly biased and have higher variance, but so they can be defined as a convolution over the function space.

Fast Sketching of Polynomial Kernels of Polynomial Degree

- Computer ScienceICML
- 2021

A new oblivious sketch is given which greatly improves the running time of the fastest algorithms for approximating a large family of slow-growing kernels, by removing the dependence on q in the leading order term.

Exponentially Improved Dimensionality Reduction for 𝓁1: Subspace Embeddings and Independence Testing

- Mathematics, Computer ScienceCOLT
- 2021

The linear map gives a streaming algorithm for independence testing using space 2 2)(ε−1 log d) O(q) space bound of Braverman and Ostrovsky (STOC, 2010), and for subspace embeddings, the setting when A is itself drawn from distributions with independent entries is studied, and a polynomial embedding dimension is obtained.

Relative Error Embeddings of the Gaussian Kernel Distance

- Computer ScienceALT
- 2017

It is shown in this paper that for the Gaussian kernel the Euclidean norm between these mapped to features has $(1+\epsilon)-relative error with respect to the kernel distance.

Lower Memory Oblivious (Tensor) Subspace Embeddings with Fewer Random Bits: Modewise Methods for Least Squares

- Computer ScienceSIAM J. Matrix Anal. Appl.
- 2021

Applications related to compression and fast compressed least squares solution methods are considered, including those used for fitting low-rank CP decompositions, and the proposed JL embedding results are shown to work well numerically in both settings.

## References

SHOWING 1-10 OF 17 REFERENCES

Fast and scalable polynomial kernels via explicit feature maps

- Computer ScienceKDD
- 2013

A novel randomized tensor product technique, called Tensor Sketching, is proposed for approximating any polynomial kernel in O(n(d+D \log{D})) time, and achieves higher accuracy and often runs orders of magnitude faster than the state-of-the-art approach for large-scale real-world datasets.

Low-Rank Approximation and Regression in Input Sparsity Time

- Computer ScienceArXiv
- 2012

We design a new distribution over m × n matrices S so that, for any fixed n × d matrix A of rank r, with probability at least 9/10, ∥SAx∥2 = (1 ± ε)∥Ax∥2 simultaneously for all x ∈ Rd. Here, m is…

Compact Random Feature Maps

- Computer ScienceICML
- 2014

The error bounds of CRAFT maps are proved demonstrating their superior kernel reconstruction performance compared to the previous approximation schemes, and it is shown how structured random matrices can be used to efficiently generate CRAFTMaps.

Numerical linear algebra in the streaming model

- Computer ScienceSTOC '09
- 2009

Near-optimal space bounds are given in the streaming model for linear algebra problems that include estimation of matrix products, linear regression, low-rank approximation, and approximation of matrix rank; results for turnstile updates are proved.

Sketching Structured Matrices for Faster Nonlinear Regression

- Computer Science, MathematicsNIPS
- 2013

This work considers a class of structured regression problems which involve Vandermonde matrices which arise naturally in various statistical modeling settings, and shows that this structure can be exploited to further accelerate the solution of the regression problem.

Random Features for Large-Scale Kernel Machines

- Computer ScienceNIPS
- 2007

Two sets of random features are explored, provided convergence bounds on their ability to approximate various radial basis kernels, and it is shown that in large-scale classification and regression tasks linear machine learning algorithms applied to these features outperform state-of-the-art large- scale kernel machines.

OSNAP: Faster Numerical Linear Algebra Algorithms via Sparser Subspace Embeddings

- Computer Science2013 IEEE 54th Annual Symposium on Foundations of Computer Science
- 2013

The main result is essentially a Bai-Yin type theorem in random matrix theory and is likely to be of independent interest: for any fixed U ∈ R<sup>n×d</sup> with orthonormal columns and random sparse Π, all singular values of ΠU lie in [1 - ε, 1 + ε] with good probability.

Fastfood: Approximate Kernel Expansions in Loglinear Time

- Computer ScienceICML 2013
- 2013

Improvements to Fastfood, an approximation that accelerates kernel methods significantly and achieves similar accuracy to full kernel expansions and Random Kitchen Sinks while being 100x faster and using 1000x less memory, make kernel methods more practical for applications that have large training sets and/or require real-time prediction.

Relative-Error CUR Matrix Decompositions

- Computer Science, MathematicsSIAM J. Matrix Anal. Appl.
- 2008

These two algorithms are the first polynomial time algorithms for such low-rank matrix approximations that come with relative-error guarantees; previously, in some cases, it was not even known whether such matrix decompositions exist.

Randomized Algorithms for Matrices and Data

- Computer ScienceFound. Trends Mach. Learn.
- 2011

This monograph will provide a detailed overview of recent work on the theory of randomized matrix algorithms as well as the application of those ideas to the solution of practical problems in large-scale data analysis.