• Corpus ID: 220404291

Near Input Sparsity Time Kernel Embeddings via Adaptive Sampling

@inproceedings{Woodruff2020NearIS,
  title={Near Input Sparsity Time Kernel Embeddings via Adaptive Sampling},
  author={David P. Woodruff and Amir Zandieh},
  booktitle={ICML},
  year={2020}
}
To accelerate kernel methods, we propose a near input sparsity time algorithm for sampling the high-dimensional feature space implicitly defined by a kernel transformation. Our main contribution is an importance sampling method for subsampling the feature space of a degree $q$ tensoring of data points in almost input sparsity time, improving the recent oblivious sketching method of (Ahle et al., 2020) by a factor of $q^{5/2}/\epsilon^2$. This leads to a subspace embedding for the polynomial… 

Tables from this paper

Leverage Score Sampling for Tensor Product Matrices in Input Sparsity Time

We propose an input sparsity time sampling algorithm that can spectrally approximate the Gram matrix corresponding to the q-fold column-wise tensor product of q matrices using a nearly optimal number

Fast Sketching of Polynomial Kernels of Polynomial Degree

TLDR
A new oblivious sketch is given which greatly improves the running time of the fastest algorithms for approximating a large family of slow-growing kernels, by removing the dependence on q in the leading order term.

Fast Algorithms for Monotone Lower Subsets of Kronecker Least Squares Problems

TLDR
This paper develops efficient leverage score-based sampling methods for matrices with certain Kronecker product-type structure, and numerical examples show that sketches based on exact leverage score sampling for a class of structured matrices achieve superior residual compared to approximate leverage scored sampling methods.

Random Gegenbauer Features for Scalable Kernel Methods

TLDR
This work proposes efficient random features for approximating a new and rich class of kernel functions that it refers to as Generalized Zonal Kernels (GZK), and proves subspace embedding guarantees for Gegenbauer features which ensures that these features can be used for approximately solving learning problems such as kernel k-means clustering, kernel ridge regression, etc.

Random Features for Kernel Approximation: A Survey on Algorithms, Theory, and Beyond

TLDR
This survey systematically review the work on random features from the past ten years and discusses the relationship between random features and modern over-parameterized deep neural networks (DNNs), including the use of high dimensional random features in the analysis of DNNs as well as the gaps between current theoretical and empirical results.

Training Multi-Layer Over-Parametrized Neural Network in Subquadratic Time

TLDR
This work proposes a framework that uses m cost only in the initialization phase and achieves a truly subquadratic cost per iteration in terms of m, i.e., m per iteration, and makes use of various techniques, including a shifted ReLU-based sparsifier, a lazy low rank maintenance data structure, fast rectangular matrix multiplication, tensor-based sketching techniques and preconditioning.

In-Database Regression in Input Sparsity Time

TLDR
This work design subspace embeddings for database joins which can be computed significantly faster than computing the join, and extends the results to arbitrary joins for the ridge regression problem, significantly improving upon the running time of prior FAQ-based methods for regression.

Scaling Neural Tangent Kernels via Sketching and Random Features

TLDR
A near input-sparsity time approximation algorithm for NTK is designed, by sketching the polynomial expansions of arc-cosine kernels, which proves a spectral approximation guarantee for the NTK matrix and proves that any image can be transformed using a linear runtime in the number of pixels.

Kernel approximation on algebraic varieties

TLDR
The main technical insight is to approximate smooth kernels by polynomial kernels, and leverage two key properties of polynometric kernels that hold when they are restricted to a variety.

Random Features for the Neural Tangent Kernel

TLDR
This work proposes an efficient feature map construction of the NTK of fullyconnected ReLU network which enables it to apply it to large-scale datasets and shows that dimension of the resulting features is much smaller than other baseline feature map constructions to achieve comparable error bounds both in theory and practice.

References

SHOWING 1-10 OF 22 REFERENCES

Oblivious Sketching of High-Degree Polynomial Kernels

TLDR
This work is a general method for applying sketching solutions developed in numerical linear algebra over the past decade to a tensoring of data points without forming the tensoring explicitly, and leads to the first oblivious sketch for the polynomial kernel with a target dimension that is only polynomially dependent on the degree of the kernel function.

Recursive Sampling for the Nystrom Method

We give the first algorithm for kernel Nystrom approximation that runs in linear time in the number of training points and is provably accurate for all kernel matrices, without dependence on

Faster Kernel Ridge Regression Using Sketching and Preconditioning

TLDR
This paper proposes a preconditioning technique based on random feature maps, such as random Fourier features, which have recently emerged as a powerful technique for speeding up and scaling the training of kernel-based methods by resorting to approximations.

Fast Randomized Kernel Ridge Regression with Statistical Guarantees

TLDR
A version of this approach that comes with running time guarantees as well as improved guarantees on its statistical performance is described, and a fast algorithm is presented to quickly compute coarse approximations to these scores in time linear in the number of samples.

Random Features for Large-Scale Kernel Machines

TLDR
Two sets of random features are explored, provided convergence bounds on their ability to approximate various radial basis kernels, and it is shown that in large-scale classification and regression tasks linear machine learning algorithms applied to these features outperform state-of-the-art large- scale kernel machines.

Scaling up Kernel Ridge Regression via Locality Sensitive Hashing

TLDR
A simple weighted version of random binning features is introduced and it is shown that the corresponding kernel function generates Gaussian processes of any desired smoothness, leading to efficient algorithms for kernel ridge regression.

Random Fourier Features for Kernel Ridge Regression: Approximation Bounds and Statistical Guarantees

TLDR
The results are twofold: on the one hand, it is shown that random Fourier feature approximation can provably speed up kernel ridge regression under reasonable assumptions, and on the other hand, the method is suboptimal, and sampling from a modified distribution in Fourier space, given by the leverage function of the kernel, yields provably better performance.

Input Sparsity Time Low-rank Approximation via Ridge Leverage Score Sampling

We present a new algorithm for finding a near optimal low-rank approximation of a matrix $A$ in $O(nnz(A))$ time. Our method is based on a recursive sampling scheme for computing a representative

Online Row Sampling

TLDR
This work presents an extremely simple algorithm that approximates A up to multiplicative error $\epsilon$ and additive error $\delta$ using O(d \log d \log(\epSilon||A||_2/\delta)/\ep silon^2)$ online samples, with memory overhead proportional to the cost of storing the spectral approximation.

Concentration Inequalities - A Nonasymptotic Theory of Independence

TLDR
Deep connections with isoperimetric problems are revealed whilst special attention is paid to applications to the supremum of empirical processes.