# Random Features for Large-Scale Kernel Machines

@inproceedings{Rahimi2007RandomFF, title={Random Features for Large-Scale Kernel Machines}, author={Ali Rahimi and Benjamin Recht}, booktitle={NIPS}, year={2007} }

To accelerate the training of kernel machines, we propose to map the input data to a randomized low-dimensional feature space and then apply existing fast linear methods. The features are designed so that the inner products of the transformed data are approximately equal to those in the feature space of a user specified shift-invariant kernel. We explore two sets of random features, provide convergence bounds on their ability to approximate various radial basis kernels, and show that in large…

## 2,975 Citations

### Learning Kernels with Random Features

- Computer ScienceNIPS
- 2016

This work presents an efficient optimization problem that learns a kernel in a supervised manner and proves the consistency of the estimated kernel as well as generalization bounds for the class of estimators induced by the optimized kernel.

### Kernel Approximation

- Computer Science
- 2015

This project explores the kernel approximation techniques based on randomization, which project input data points into high dimensional feature space and find the optimal hyperplane in that feature space by using random Fourier features, random feature maps, Nystrom approximation.

### An Empirical Study on The Properties of Random Bases for Kernel Methods

- Computer ScienceNIPS
- 2017

This work contrasts random features of approximated kernel machines with learned features of neural networks, and presents basis adaptation schemes that allow for a more compact representation, while retaining the generalization properties of kernel machines.

### Scalable Kernel Learning Via the Discriminant Information

- Computer ScienceICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2020

This work utilizes the Discriminant Information criterion, a measure of class separability with a strong connection todiscriminant Analysis, to develop scalable methods to learn kernel features with high discriminant power.

### Finding Small Sets of Random Fourier Features for Shift-Invariant Kernel Approximation

- Computer ScienceANNPR
- 2016

A simple test is proposed to identify a small set of random fourier features with linear costs, substantially reducing the number of generated features for low rank kernel matrices, while widely keeping the same representation accuracy.

### Large-Scale Minimal Complexity Machines Using Explicit Feature Maps

- Computer ScienceIEEE Transactions on Systems, Man, and Cybernetics: Systems
- 2017

This paper presents a stochastic subgradient descent solver for large-scale machine learning with the MCM that uses an explicit feature map-based approximation of the kernel, to improve the scalability of the algorithm.

### Low Dimensional Explicit Feature Maps

- Computer Science2015 IEEE International Conference on Computer Vision (ICCV)
- 2015

A novel method of data independent construction of low dimensional feature maps for shift-invariant and homogeneous kernels which achieves a better approximations at the same dimensionality or comparable approxIMations at lower dimensionality of the feature map compared with state-of-the-art methods.

### Scalable Gaussian Kernel Support Vector Machines with Sublinear Training Time Complexity

- Computer ScienceInf. Sci.
- 2017

### Large-scale Nonlinear Variable Selection via Kernel Random Features

- Computer ScienceECML/PKDD
- 2018

This is the first kernel-based variable selection method applicable to large datasets that sidesteps the typical poor scaling properties of kernel methods by mapping the inputs into a relatively low-dimensional space of random features.

### Data Dependent Kernel Approximation using Pseudo Random Fourier Features

- Computer ScienceArXiv
- 2017

A kernel approximation method in a data dependent way, coined as Pseudo Random Fourier Features (PRFF) for reducing the number of feature dimensions and also to improve the prediction performance is proposed.

## References

SHOWING 1-10 OF 17 REFERENCES

### Sampling Techniques for Kernel Methods

- Computer ScienceNIPS
- 2001

All three randomized techniques for speeding up Kernel Principal Component Analysis on three levels can be viewed as instantiations of the following idea: replace the kernel function k by a "randomized kernel" which behaves like k in expectation.

### Random Projection, Margins, Kernels, and Feature-Selection

- Computer ScienceSLSFS
- 2005

It is discussed how, given a kernel as a black-box function, the authors can use various forms of random projection to extract an explicit small feature space that captures much of what the kernel is doing.

### Efficient Kernel Machines Using the Improved Fast Gauss Transform

- Computer ScienceNIPS
- 2004

An approximation technique based on the improved fast Gauss transform to reduce the computation to O(N) is presented and an error bound for the approximation is given.

### Fast Query-Optimized Kernel Machine Classification Via Incremental Approximate Nearest Support Vectors

- Computer ScienceICML
- 2003

This work proposes a new and efficient approach based on treating the kernel machine classifier as a special form of k nearest-neighbor by determining at query-time a good k for each query, based on pre-query analysis guided by the original robust kernel machine.

### Core Vector Machines: Fast SVM Training on Very Large Data Sets

- Computer ScienceJ. Mach. Learn. Res.
- 2005

This paper shows that many kernel methods can be equivalently formulated as minimum enclosing ball (MEB) problems in computational geometry and obtains provably approximately optimal solutions with the idea of core sets, and proposes the proposed Core Vector Machine (CVM) algorithm, which can be used with nonlinear kernels and has a time complexity that is linear in m.

### Training linear SVMs in linear time

- Computer ScienceKDD '06
- 2006

A Cutting Plane Algorithm for training linear SVMs that provably has training time 0(s,n) for classification problems and o(sn log (n)) for ordinal regression problems and several orders of magnitude faster than decomposition methods like svm light for large datasets.

### On the Nyström Method for Approximating a Gram Matrix for Improved Kernel-Based Learning

- Computer Science, MathematicsJ. Mach. Learn. Res.
- 2005

An algorithm to compute an easily-interpretable low-rank approximation to an n x n Gram matrix G such that computations of interest may be performed more rapidly.

### Using Analytic QP and Sparseness to Speed Training of Support Vector Machines

- Computer ScienceNIPS
- 1998

An algorithm for training SVMs: Sequential Minimal Optimization, or SMO, which breaks the large QP problem into a series of smallest possible QP problems which are analytically solvable and does not require a numerical QP library.

### Comments on the "Core Vector Machines: Fast SVM Training on Very Large Data Sets"

- Computer ScienceJ. Mach. Learn. Res.
- 2007

It turns out that to some extent, the results contradict those reported in the CVM paper, and some of the experiments are reproduced to clarify the matter.

### Interior-Point Methods for Massive Support Vector Machines

- Computer ScienceSIAM J. Optim.
- 2002

We investigate the use of interior-point methods for solving quadratic programming problems with a small number of linear constraints, where the quadratic term consists of a low-rank update to a…