# Near Input Sparsity Time Kernel Embeddings via Adaptive Sampling

@inproceedings{Woodruff2020NearIS, title={Near Input Sparsity Time Kernel Embeddings via Adaptive Sampling}, author={David P. Woodruff and Amir Zandieh}, booktitle={ICML}, year={2020} }

To accelerate kernel methods, we propose a near input sparsity time algorithm for sampling the high-dimensional feature space implicitly defined by a kernel transformation. Our main contribution is an importance sampling method for subsampling the feature space of a degree $q$ tensoring of data points in almost input sparsity time, improving the recent oblivious sketching method of (Ahle et al., 2020) by a factor of $q^{5/2}/\epsilon^2$. This leads to a subspace embedding for the polynomial…

## Tables from this paper

## 14 Citations

### Leverage Score Sampling for Tensor Product Matrices in Input Sparsity Time

- Computer Science, MathematicsICML
- 2022

We propose an input sparsity time sampling algorithm that can spectrally approximate the Gram matrix corresponding to the q-fold column-wise tensor product of q matrices using a nearly optimal number…

### Fast Sketching of Polynomial Kernels of Polynomial Degree

- Computer ScienceICML
- 2021

A new oblivious sketch is given which greatly improves the running time of the fastest algorithms for approximating a large family of slow-growing kernels, by removing the dependence on q in the leading order term.

### Fast Algorithms for Monotone Lower Subsets of Kronecker Least Squares Problems

- Computer Science, Mathematics
- 2022

This paper develops eﬃcient leverage score-based sampling methods for matrices with certain Kronecker product-type structure, and numerical examples show that sketches based on exact leverage score sampling for a class of structured matrices achieve superior residual compared to approximate leverage scored sampling methods.

### Random Gegenbauer Features for Scalable Kernel Methods

- Computer ScienceICML
- 2022

This work proposes efficient random features for approximating a new and rich class of kernel functions that it refers to as Generalized Zonal Kernels (GZK), and proves subspace embedding guarantees for Gegenbauer features which ensures that these features can be used for approximately solving learning problems such as kernel k-means clustering, kernel ridge regression, etc.

### Random Features for Kernel Approximation: A Survey on Algorithms, Theory, and Beyond

- Computer ScienceIEEE Transactions on Pattern Analysis and Machine Intelligence
- 2022

This survey systematically review the work on random features from the past ten years and discusses the relationship between random features and modern over-parameterized deep neural networks (DNNs), including the use of high dimensional random features in the analysis of DNNs as well as the gaps between current theoretical and empirical results.

### Training Multi-Layer Over-Parametrized Neural Network in Subquadratic Time

- Computer ScienceArXiv
- 2021

This work proposes a framework that uses m cost only in the initialization phase and achieves a truly subquadratic cost per iteration in terms of m, i.e., m per iteration, and makes use of various techniques, including a shifted ReLU-based sparsifier, a lazy low rank maintenance data structure, fast rectangular matrix multiplication, tensor-based sketching techniques and preconditioning.

### In-Database Regression in Input Sparsity Time

- Computer ScienceICML
- 2021

This work design subspace embeddings for database joins which can be computed significantly faster than computing the join, and extends the results to arbitrary joins for the ridge regression problem, significantly improving upon the running time of prior FAQ-based methods for regression.

### Scaling Neural Tangent Kernels via Sketching and Random Features

- Computer ScienceNeurIPS
- 2021

A near input-sparsity time approximation algorithm for NTK is designed, by sketching the polynomial expansions of arc-cosine kernels, which proves a spectral approximation guarantee for the NTK matrix and proves that any image can be transformed using a linear runtime in the number of pixels.

### Kernel approximation on algebraic varieties

- Computer ScienceArXiv
- 2021

The main technical insight is to approximate smooth kernels by polynomial kernels, and leverage two key properties of polynometric kernels that hold when they are restricted to a variety.

### Random Features for the Neural Tangent Kernel

- Computer ScienceArXiv
- 2021

This work proposes an efficient feature map construction of the NTK of fullyconnected ReLU network which enables it to apply it to large-scale datasets and shows that dimension of the resulting features is much smaller than other baseline feature map constructions to achieve comparable error bounds both in theory and practice.

## References

SHOWING 1-10 OF 22 REFERENCES

### Oblivious Sketching of High-Degree Polynomial Kernels

- Computer Science, MathematicsSODA
- 2020

This work is a general method for applying sketching solutions developed in numerical linear algebra over the past decade to a tensoring of data points without forming the tensoring explicitly, and leads to the first oblivious sketch for the polynomial kernel with a target dimension that is only polynomially dependent on the degree of the kernel function.

### Recursive Sampling for the Nystrom Method

- Computer ScienceNIPS
- 2017

We give the first algorithm for kernel Nystrom approximation that runs in linear time in the number of training points and is provably accurate for all kernel matrices, without dependence on…

### Faster Kernel Ridge Regression Using Sketching and Preconditioning

- Computer ScienceSIAM J. Matrix Anal. Appl.
- 2017

This paper proposes a preconditioning technique based on random feature maps, such as random Fourier features, which have recently emerged as a powerful technique for speeding up and scaling the training of kernel-based methods by resorting to approximations.

### Fast Randomized Kernel Ridge Regression with Statistical Guarantees

- Computer ScienceNIPS
- 2015

A version of this approach that comes with running time guarantees as well as improved guarantees on its statistical performance is described, and a fast algorithm is presented to quickly compute coarse approximations to these scores in time linear in the number of samples.

### Random Features for Large-Scale Kernel Machines

- Computer ScienceNIPS
- 2007

Two sets of random features are explored, provided convergence bounds on their ability to approximate various radial basis kernels, and it is shown that in large-scale classification and regression tasks linear machine learning algorithms applied to these features outperform state-of-the-art large- scale kernel machines.

### Scaling up Kernel Ridge Regression via Locality Sensitive Hashing

- Computer ScienceAISTATS
- 2020

A simple weighted version of random binning features is introduced and it is shown that the corresponding kernel function generates Gaussian processes of any desired smoothness, leading to efficient algorithms for kernel ridge regression.

### Random Fourier Features for Kernel Ridge Regression: Approximation Bounds and Statistical Guarantees

- Computer ScienceICML
- 2017

The results are twofold: on the one hand, it is shown that random Fourier feature approximation can provably speed up kernel ridge regression under reasonable assumptions, and on the other hand, the method is suboptimal, and sampling from a modified distribution in Fourier space, given by the leverage function of the kernel, yields provably better performance.

### Input Sparsity Time Low-rank Approximation via Ridge Leverage Score Sampling

- Computer ScienceSODA
- 2017

We present a new algorithm for finding a near optimal low-rank approximation of a matrix $A$ in $O(nnz(A))$ time. Our method is based on a recursive sampling scheme for computing a representative…

### Online Row Sampling

- Computer ScienceAPPROX-RANDOM
- 2016

This work presents an extremely simple algorithm that approximates A up to multiplicative error $\epsilon$ and additive error $\delta$ using O(d \log d \log(\epSilon||A||_2/\delta)/\ep silon^2)$ online samples, with memory overhead proportional to the cost of storing the spectral approximation.

### Concentration Inequalities - A Nonasymptotic Theory of Independence

- MathematicsConcentration Inequalities
- 2013

Deep connections with isoperimetric problems are revealed whilst special attention is paid to applications to the supremum of empirical processes.