• Corpus ID: 877929

Random Features for Large-Scale Kernel Machines

@inproceedings{Rahimi2007RandomFF,
  title={Random Features for Large-Scale Kernel Machines},
  author={Ali Rahimi and Benjamin Recht},
  booktitle={NIPS},
  year={2007}
}
To accelerate the training of kernel machines, we propose to map the input data to a randomized low-dimensional feature space and then apply existing fast linear methods. The features are designed so that the inner products of the transformed data are approximately equal to those in the feature space of a user specified shift-invariant kernel. We explore two sets of random features, provide convergence bounds on their ability to approximate various radial basis kernels, and show that in large… 

Figures and Tables from this paper

Learning Kernels with Random Features

TLDR
This work presents an efficient optimization problem that learns a kernel in a supervised manner and proves the consistency of the estimated kernel as well as generalization bounds for the class of estimators induced by the optimized kernel.

Kernel Approximation

TLDR
This project explores the kernel approximation techniques based on randomization, which project input data points into high dimensional feature space and find the optimal hyperplane in that feature space by using random Fourier features, random feature maps, Nystrom approximation.

An Empirical Study on The Properties of Random Bases for Kernel Methods

TLDR
This work contrasts random features of approximated kernel machines with learned features of neural networks, and presents basis adaptation schemes that allow for a more compact representation, while retaining the generalization properties of kernel machines.

Scalable Kernel Learning Via the Discriminant Information

  • Mert AlZejiang HouS. Kung
  • Computer Science
    ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2020
TLDR
This work utilizes the Discriminant Information criterion, a measure of class separability with a strong connection todiscriminant Analysis, to develop scalable methods to learn kernel features with high discriminant power.

Finding Small Sets of Random Fourier Features for Shift-Invariant Kernel Approximation

TLDR
A simple test is proposed to identify a small set of random fourier features with linear costs, substantially reducing the number of generated features for low rank kernel matrices, while widely keeping the same representation accuracy.

Large-Scale Minimal Complexity Machines Using Explicit Feature Maps

TLDR
This paper presents a stochastic subgradient descent solver for large-scale machine learning with the MCM that uses an explicit feature map-based approximation of the kernel, to improve the scalability of the algorithm.

Low Dimensional Explicit Feature Maps

  • O. Chum
  • Computer Science
    2015 IEEE International Conference on Computer Vision (ICCV)
  • 2015
TLDR
A novel method of data independent construction of low dimensional feature maps for shift-invariant and homogeneous kernels which achieves a better approximations at the same dimensionality or comparable approxIMations at lower dimensionality of the feature map compared with state-of-the-art methods.

Large-scale Nonlinear Variable Selection via Kernel Random Features

TLDR
This is the first kernel-based variable selection method applicable to large datasets that sidesteps the typical poor scaling properties of kernel methods by mapping the inputs into a relatively low-dimensional space of random features.

Data Dependent Kernel Approximation using Pseudo Random Fourier Features

TLDR
A kernel approximation method in a data dependent way, coined as Pseudo Random Fourier Features (PRFF) for reducing the number of feature dimensions and also to improve the prediction performance is proposed.
...

References

SHOWING 1-10 OF 17 REFERENCES

Sampling Techniques for Kernel Methods

TLDR
All three randomized techniques for speeding up Kernel Principal Component Analysis on three levels can be viewed as instantiations of the following idea: replace the kernel function k by a "randomized kernel" which behaves like k in expectation.

Random Projection, Margins, Kernels, and Feature-Selection

TLDR
It is discussed how, given a kernel as a black-box function, the authors can use various forms of random projection to extract an explicit small feature space that captures much of what the kernel is doing.

Efficient Kernel Machines Using the Improved Fast Gauss Transform

TLDR
An approximation technique based on the improved fast Gauss transform to reduce the computation to O(N) is presented and an error bound for the approximation is given.

Fast Query-Optimized Kernel Machine Classification Via Incremental Approximate Nearest Support Vectors

TLDR
This work proposes a new and efficient approach based on treating the kernel machine classifier as a special form of k nearest-neighbor by determining at query-time a good k for each query, based on pre-query analysis guided by the original robust kernel machine.

Core Vector Machines: Fast SVM Training on Very Large Data Sets

TLDR
This paper shows that many kernel methods can be equivalently formulated as minimum enclosing ball (MEB) problems in computational geometry and obtains provably approximately optimal solutions with the idea of core sets, and proposes the proposed Core Vector Machine (CVM) algorithm, which can be used with nonlinear kernels and has a time complexity that is linear in m.

Training linear SVMs in linear time

TLDR
A Cutting Plane Algorithm for training linear SVMs that provably has training time 0(s,n) for classification problems and o(sn log (n)) for ordinal regression problems and several orders of magnitude faster than decomposition methods like svm light for large datasets.

On the Nyström Method for Approximating a Gram Matrix for Improved Kernel-Based Learning

TLDR
An algorithm to compute an easily-interpretable low-rank approximation to an n x n Gram matrix G such that computations of interest may be performed more rapidly.

Using Analytic QP and Sparseness to Speed Training of Support Vector Machines

TLDR
An algorithm for training SVMs: Sequential Minimal Optimization, or SMO, which breaks the large QP problem into a series of smallest possible QP problems which are analytically solvable and does not require a numerical QP library.

Comments on the "Core Vector Machines: Fast SVM Training on Very Large Data Sets"

TLDR
It turns out that to some extent, the results contradict those reported in the CVM paper, and some of the experiments are reproduced to clarify the matter.

Interior-Point Methods for Massive Support Vector Machines

We investigate the use of interior-point methods for solving quadratic programming problems with a small number of linear constraints, where the quadratic term consists of a low-rank update to a