• Corpus ID: 18609661

Training-Efficient Feature Map for Shift-Invariant Kernels

  title={Training-Efficient Feature Map for Shift-Invariant Kernels},
  author={Xixian Chen and Haiqin Yang and Irwin King and Michael R. Lyu},
Random feature map is popularly used to scale up kernel methods. However, employing a large number of mapped features to ensure an accurate approximation will still make the training time consuming. In this paper, we aim to improve the training efficiency of shift-invariant kernels by using fewer informative features without sacrificing precision. We propose a novel feature map method by extending Random Kitchen Sinks through fast data-dependent subspace embedding to generate the desired… 

Figures and Tables from this paper

Random Features for Shift-Invariant Kernels with Moment Matching
This paper presents a novel sampling algorithm powered by moment matching techniques to reduce the variance of random features and proves the superiority of the proposed algorithm in Gram matrix approximation and generalization errors in regression.
Utilize Old Coordinates: Faster Doubly Stochastic Gradients for Kernel Methods
Two algorithms are proposed to remedy the scalability issue of kernel methods by "utilizing" old random features instead of adding new features in certain iterations, and the resulting procedure is surprisingly simple without enhancing the complexity of the original algorithm but effective in practice.
Quadrature-based features for kernel approximation
A unifying approach is proposed that reinterprets the previous random features methods and extends to better estimates of the kernel approximation, derive the convergence behaviour and conduct an extensive empirical study that supports the hypothesis.
Making Online Sketching Hashing Even Faster
This work utilizes online sketching hashing (OSH) and presents a FasteR Online Sketching Hashing Hashing (FROSH) algorithm to sketch the data in a more compact form via an independent transformation and provides theoretical justification to guarantee that the proposed FROSH consumes less time and achieves a comparable sketching precision under the same memory cost of OSH.
Effective Data-Aware Covariance Estimator From Compressed Data
The proposed DACE is extended to tackle multiclass classification problems with theoretical justification and conduct extensive experiments on both synthetic and real-world data sets to demonstrate the superior performance of the DACE.


Compact Random Feature Maps
The error bounds of CRAFT maps are proved demonstrating their superior kernel reconstruction performance compared to the previous approximation schemes, and it is shown how structured random matrices can be used to efficiently generate CRAFTMaps.
Quasi-Monte Carlo Feature Maps for Shift-Invariant Kernels
A new discrepancy measure called box discrepancy is derived based on theoretical characterizations of the integration error with respect to a given sequence based on explicit box discrepancy minimization in Quasi-Monte Carlo (QMC) approximations.
Random Features for Large-Scale Kernel Machines
Two sets of random features are explored, provided convergence bounds on their ability to approximate various radial basis kernels, and it is shown that in large-scale classification and regression tasks linear machine learning algorithms applied to these features outperform state-of-the-art large- scale kernel machines.
Efficient Sparse Generalized Multiple Kernel Learning
This paper proposes a generalized MKL model with a constraint on a linear combination of the -norm and the squared -norm on the kernel weights to seek the optimal kernel combination weights, which enjoys the favorable sparsity property on the solution and also facilitates the grouping effect.
On the Complexity of Learning with Kernels
There are kernel learning problems where no such method will lead to non-trivial computational savings, and lower bounds on the error attainable by such methods as a function of the number of entries observed in the kernel matrix or the rank of an approximate kernel matrix are studied.
Scalable Kernel Methods via Doubly Stochastic Gradients
An approach that scales up kernel methods using a novel concept called "doubly stochastic functional gradients" based on the fact that many kernel methods can be expressed as convex optimization problems, which can readily scale kernel methods up to the regimes which are dominated by neural nets.
On the Impact of Kernel Approximation on Learning Accuracy
Stability bounds based on the norm of the kernel approximation for these algorithms, including SVMs, KRR, and graph Laplacian-based regularization algorithms, are given to determine the degree of approximation that can be tolerated in the estimation of thekernel matrix.
Nyström Method vs Random Fourier Features: A Theoretical and Empirical Comparison
It is shown that when there is a large gap in the eigen-spectrum of the kernel matrix, approaches based on the Nystrom method can yield impressively better generalization error bound than random Fourier features based approach.
Sparse Learning Under Regularization Framework
This thesis develops a novel online learning framework to solve group lasso and multi-task feature selection and proposes a generalized MKL (GMKL) model by introducing an elastic net-type constraint on the kernel weights to seek the optimal kernel combination weights.
Fastfood: Approximate Kernel Expansions in Loglinear Time
Improvements to Fastfood, an approximation that accelerates kernel methods significantly and achieves similar accuracy to full kernel expansions and Random Kitchen Sinks while being 100x faster and using 1000x less memory, make kernel methods more practical for applications that have large training sets and/or require real-time prediction.