• Corpus ID: 214612256

Scaling up Kernel Ridge Regression via Locality Sensitive Hashing

@article{Kapralov2020ScalingUK,
  title={Scaling up Kernel Ridge Regression via Locality Sensitive Hashing},
  author={Michael Kapralov and Navid Nouri and Ilya P. Razenshteyn and Ameya Velingker and Amir Zandieh},
  journal={ArXiv},
  year={2020},
  volume={abs/2003.09756}
}
Random binning features, introduced in the seminal paper of Rahimi and Recht (2007), are an efficient method for approximating a kernel matrix using locality sensitive hashing. Random binning features provide a very simple and efficient way of approximating the Laplace kernel but unfortunately do not apply to many important classes of kernels, notably ones that generate smooth Gaussian processes, such as the Gaussian kernel and Matern kernel. In this paper, we introduce a simple weighted… 

Figures and Tables from this paper

Random Gegenbauer Features for Scalable Kernel Methods
TLDR
This work proposes efficient random features for approximating a new and rich class of kernel functions that it refers to as Generalized Zonal Kernels (GZK), and proves subspace embedding guarantees for Gegenbauer features which ensures that these features can be used for approximately solving learning problems such as kernel k-means clustering, kernel ridge regression, etc.
Near Input Sparsity Time Kernel Embeddings via Adaptive Sampling
TLDR
A near input sparsity time algorithm for sampling the high-dimensional feature space implicitly defined by a kernel transformation, and shows how its subspace embedding bounds imply new statistical guarantees for kernel ridge regression.
Fast Sketching of Polynomial Kernels of Polynomial Degree
TLDR
A new oblivious sketch is given which greatly improves the running time of the fastest algorithms for approximating a large family of slow-growing kernels, by removing the dependence on q in the leading order term.
Learning with Neural Tangent Kernels in Near Input Sparsity Time
TLDR
A near input sparsity time algorithm that maps the input data to a randomized low-dimensional feature space so that the inner product of the transformed data approximates their NTK evaluation.
Generalized Leverage Score Sampling for Neural Networks
TLDR
The equivalence between regularized neural network and neural tangent kernel ridge regression under the initialization of both classical random Gaussian and leverage score sampling is proved.
Sublinear Least-Squares Value Iteration via Locality Sensitive Hashing
TLDR
This work builds the connections between the theory of approximate maximum inner product search and the regret analysis of reinforcement learning, and presents the first provable Least-Squares Value Iteration algorithms that achieves runtime complexity sublinear in the number of actions.
Breaking the Linear Iteration Cost Barrier for Some Well-known Conditional Gradient Methods Using MaxIP Data-structures
TLDR
This work provides a formal framework to combine the locality sensitive hashing type approximate MaxIP data-structures with CGM algorithms, and shows the first algorithm, where the cost per iteration is sublinear in the number of parameters, for many fundamental optimization algorithms, e.g., Frank-Wolfe, Herding algorithm, and policy gradient.
Posterior and Computational Uncertainty in Gaussian Processes
TLDR
A new class of methods is developed that provides consistent estimation of the combined uncertainty arising from both the finite number of data observed and the finite amount of computation expended, and the consequences of ignoring computational uncertainty are demonstrated.

References

SHOWING 1-10 OF 22 REFERENCES
Faster Kernel Ridge Regression Using Sketching and Preconditioning
TLDR
This paper proposes a preconditioning technique based on random feature maps, such as random Fourier features, which have recently emerged as a powerful technique for speeding up and scaling the training of kernel-based methods by resorting to approximations.
Random Features for Large-Scale Kernel Machines
TLDR
Two sets of random features are explored, provided convergence bounds on their ability to approximate various radial basis kernels, and it is shown that in large-scale classification and regression tasks linear machine learning algorithms applied to these features outperform state-of-the-art large- scale kernel machines.
Fast Randomized Kernel Ridge Regression with Statistical Guarantees
TLDR
A version of this approach that comes with running time guarantees as well as improved guarantees on its statistical performance is described, and a fast algorithm is presented to quickly compute coarse approximations to these scores in time linear in the number of samples.
Divide and conquer kernel ridge regression: a distributed algorithm with minimax optimal rates
TLDR
It is established that despite the computational speed-up, statistical optimality is retained: as long as m is not too large, the partition-based estimator achieves the statistical minimax rate over all estimators using the set of N samples.
Recursive Sampling for the Nystrom Method
We give the first algorithm for kernel Nystrom approximation that runs in linear time in the number of training points and is provably accurate for all kernel matrices, without dependence on
Sharp analysis of low-rank kernel matrix approximations
TLDR
This paper shows that in the context of kernel ridge regression, for approximations based on a random subset of columns of the original kernel matrix, the rank p may be chosen to be linear in the degrees of freedom associated with the problem, a quantity which is classically used in the statistical analysis of such methods.
Oblivious Sketching of High-Degree Polynomial Kernels
TLDR
This work is a general method for applying sketching solutions developed in numerical linear algebra over the past decade to a tensoring of data points without forming the tensoring explicitly, and leads to the first oblivious sketch for the polynomial kernel with a target dimension that is only polynomially dependent on the degree of the kernel function.
Optimal Rates for the Regularized Least-Squares Algorithm
TLDR
A complete minimax analysis of the problem is described, showing that the convergence rates obtained by regularized least-squares estimators are indeed optimal over a suitable class of priors defined by the considered kernel.
Posterior consistency of Gaussian process prior for nonparametric binary regression
TLDR
If the covariance kernel has derivatives up to a desired order and the bandwidth parameter of the kernel is allowed to take arbitrarily small values, it is shown that the posterior distribution is consistent in the L 1 -distance.
Less is More: Nyström Computational Regularization
TLDR
A simple incremental variant of Nystrom Kernel Regularized Least Squares is suggested, where the subsampling level implements a form of computational regularization, in the sense that it controls at the same time regularization and computations.
...
...