Nyström Sketches

  title={Nystr{\"o}m Sketches},
  author={D. Perry and Braxton Osting and Ross T. Whitaker},
Despite prolific success, kernel methods become difficult to use in many large scale unsupervised problems because of the evaluation and storage of the full Gram matrix. Here we overcome this difficulty by proposing a novel approach: compute the optimal small, out-of-sample Nyström sketch which allows for fast approximation of the Gram matrix via the Nyström method. We demonstrate and compare several methods for computing the optimal Nyström sketch and show how this approach outperforms… 


On the Nyström Method for Approximating a Gram Matrix for Improved Kernel-Based Learning
An algorithm to compute an easily-interpretable low-rank approximation to an n x n Gram matrix G such that computations of interest may be performed more rapidly.
Fast Randomized Kernel Methods With Statistical Guarantees
A version of this approach that comes with running time guarantees as well as improved guarantees on its statistical performance is described, and a new notion of the statistical leverage of a data point captures in a fine way the difficulty of the original statistical learning problem.
Nyström Method vs Random Fourier Features: A Theoretical and Empirical Comparison
It is shown that when there is a large gap in the eigen-spectrum of the kernel matrix, approaches based on the Nystrom method can yield impressively better generalization error bound than random Fourier features based approach.
Greedy Spectral Embedding
A greedy selection procedure for this subset of m examples, based on the featurespace distance between a candidate example and the span of the previously chosen ones, to estimate the embedding function based on all the data.
Fastfood: Approximate Kernel Expansions in Loglinear Time
Improvements to Fastfood, an approximation that accelerates kernel methods significantly and achieves similar accuracy to full kernel expansions and Random Kitchen Sinks while being 100x faster and using 1000x less memory, make kernel methods more practical for applications that have large training sets and/or require real-time prediction.
Random Features for Large-Scale Kernel Machines
Two sets of random features are explored, provided convergence bounds on their ability to approximate various radial basis kernels, and it is shown that in large-scale classification and regression tasks linear machine learning algorithms applied to these features outperform state-of-the-art large- scale kernel machines.
Distributed Adaptive Sampling for Kernel Matrix Approximation
SQUEAK is the first RLS sampling algorithm that never constructs the whole matrix and runs in linear time, storing a dictionary which creates accurate kernel matrix approximations with a number of points that only depends on the effective dimension of the dataset.
Sparse Kernel Feature Analysis
A new class of feature extractors employing l1 norms in coefficient space instead of the Reproducing Kernel Hilbert Space in which KPCA was originally formulated in is proposed, allowing it to efficiently extract features which maximize criteria other than the variance in a way similar to projection pursuit.
The pre-image problem in kernel methods
This paper addresses the problem of finding the pre-image of a feature vector in the feature space induced by a kernel and proposes a new method which directly finds the location of thePre-image based on distance constraints in thefeature space.
Incremental Kernel Principal Component Analysis
The basis of the proposed solution lies in computing incremental linear PCA in the kernel induced feature space, and constructing reduced-set expansions to maintain constant update speed and memory usage.