• Corpus ID: 235367639

The Fast Kernel Transform

  title={The Fast Kernel Transform},
  author={John Paul Ryan and Sebastian Ament and Carla P. Gomes and Anil Damle},
Kernel methods are a highly effective and widely used collection of modern machine learning algorithms. A fundamental limita-tion of virtually all such methods are computations involving the kernel matrix that naïvely scale quadratically (e.g., matrix-vector multiplication) or cubically (solving linear systems) with the size of the dataset N. We propose the Fast Kernel Transform (FKT), a general algorithm to compute matrix-vector multiplications (MVMs) for datasets in moderate dimensions with… 
Linear Time Kernel Matrix Approximation via Hyperspherical Harmonics
We propose a new technique for constructing lowrank approximations of matrices that arise in kernel methods for machine learning. Our approach pairs a novel automatically constructed analytic
Scalable First-Order Bayesian Optimization via Structured Automatic Differentiation
These methods apply to virtually all canonical kernels and automatically extend to complex kernels, like the neural network, radial basis function network, and spectral mixture kernels without any additional derivations, enabling flexible, problem-dependent modeling while scaling first-order BO to high d dimensions.


Automatic online tuning for fast Gaussian summation
This work provides an algorithm that combines tree methods with the Improved Fast Gauss Transform (IFGT) and employs a tree data structure, resulting in four evaluation methods whose performance varies based on the distribution of sources and targets and input parameters such as desired accuracy and bandwidth.
Memory Efficient Kernel Approximation
This paper proposes a new kernel approximation algorithm - Memory Efficient Kernel Approximation (MEKA), which considers both low-rank and clustering structure of the kernel matrix and shows that the resulting algorithm outperforms state-of-the-art low- rank kernel approximation methods in terms of speed, approximation error, and memory usage.
Hierarchically Compositional Kernels for Scalable Nonparametric Learning
Empirical results show that the proposed kernel achieves a matching performance with a smaller $r$ on data sizes up to the order of millions.
Scalable Log Determinants for Gaussian Process Kernel Learning
It is found that Lanczos is generally superior to Chebyshev for kernel learning, and that a surrogate approach can be highly efficient and accurate with popular kernels.
On the Nyström Method for Approximating a Gram Matrix for Improved Kernel-Based Learning
An algorithm to compute an easily-interpretable low-rank approximation to an n x n Gram matrix G such that computations of interest may be performed more rapidly.
Improved fast gauss transform and efficient kernel density estimation
An improved fast Gauss transform is developed to efficiently estimate sums of Gaussians in higher dimensions, where a new multivariate expansion scheme and an adaptive space subdivision technique dramatically improve the performance.
Kernel Interpolation for Scalable Structured Gaussian Processes (KISS-GP)
A new structured kernel interpolation (SKI) framework is introduced, which generalises and unifies inducing point methods for scalable Gaussian processes (GPs) and naturally enables Kronecker and Toeplitz algebra for substantial additional gains in scalability.
Fast Direct Methods for Gaussian Processes
This work shows that for the most commonly used covariance functions, the matrix C can be hierarchically factored into a product of block low-rank updates of the identity matrix, yielding an O(n log2 n) algorithm for inversion and enables the evaluation of the determinant det(C), permitting the direct calculation of probabilities in high dimensions under fairly broad assumptions on the kernel defining K.
Exact Gaussian Processes on a Million Data Points
A scalable approach for exact GPs is developed that leverages multi-GPU parallelization and methods like linear conjugate gradients, accessing the kernel matrix only through matrix multiplication, and is generally applicable, without constraints to grid data or specific kernel classes.
Random Features for Large-Scale Kernel Machines
Two sets of random features are explored, provided convergence bounds on their ability to approximate various radial basis kernels, and it is shown that in large-scale classification and regression tasks linear machine learning algorithms applied to these features outperform state-of-the-art large- scale kernel machines.