Training-Efficient Feature Map for Shift-Invariant Kernels
@inproceedings{Chen2015TrainingEfficientFM, title={Training-Efficient Feature Map for Shift-Invariant Kernels}, author={Xixian Chen and Haiqin Yang and Irwin King and Michael R. Lyu}, booktitle={IJCAI}, year={2015} }
Random feature map is popularly used to scale up kernel methods. However, employing a large number of mapped features to ensure an accurate approximation will still make the training time consuming. In this paper, we aim to improve the training efficiency of shift-invariant kernels by using fewer informative features without sacrificing precision. We propose a novel feature map method by extending Random Kitchen Sinks through fast data-dependent subspace embedding to generate the desired…
7 Citations
Random Features for Shift-Invariant Kernels with Moment Matching
- Computer ScienceAAAI
- 2017
This paper presents a novel sampling algorithm powered by moment matching techniques to reduce the variance of random features and proves the superiority of the proposed algorithm in Gram matrix approximation and generalization errors in regression.
Utilize Old Coordinates: Faster Doubly Stochastic Gradients for Kernel Methods
- Computer ScienceUAI
- 2016
Two algorithms are proposed to remedy the scalability issue of kernel methods by "utilizing" old random features instead of adding new features in certain iterations, and the resulting procedure is surprisingly simple without enhancing the complexity of the original algorithm but effective in practice.
Quadrature-based features for kernel approximation
- Computer Science, MathematicsNeurIPS
- 2018
A unifying approach is proposed that reinterprets the previous random features methods and extends to better estimates of the kernel approximation, derive the convergence behaviour and conduct an extensive empirical study that supports the hypothesis.
Making Online Sketching Hashing Even Faster
- Computer ScienceIEEE Transactions on Knowledge and Data Engineering
- 2021
This work utilizes online sketching hashing (OSH) and presents a FasteR Online Sketching Hashing Hashing (FROSH) algorithm to sketch the data in a more compact form via an independent transformation and provides theoretical justification to guarantee that the proposed FROSH consumes less time and achieves a comparable sketching precision under the same memory cost of OSH.
Maximum margin semi-supervised learning with irrelevant data
- Computer ScienceNeural Networks
- 2015
Effective Data-Aware Covariance Estimator From Compressed Data
- Computer ScienceIEEE Transactions on Neural Networks and Learning Systems
- 2020
The proposed DACE is extended to tackle multiclass classification problems with theoretical justification and conduct extensive experiments on both synthetic and real-world data sets to demonstrate the superior performance of the DACE.
Faster doubly stochastic functional gradient by gradient preconditioning for scalable kernel methods
- Computer ScienceAppl. Intell.
- 2022
References
SHOWING 1-10 OF 41 REFERENCES
Compact Random Feature Maps
- Computer ScienceICML
- 2014
The error bounds of CRAFT maps are proved demonstrating their superior kernel reconstruction performance compared to the previous approximation schemes, and it is shown how structured random matrices can be used to efficiently generate CRAFTMaps.
Quasi-Monte Carlo Feature Maps for Shift-Invariant Kernels
- Computer Science, MathematicsJ. Mach. Learn. Res.
- 2016
A new discrepancy measure called box discrepancy is derived based on theoretical characterizations of the integration error with respect to a given sequence based on explicit box discrepancy minimization in Quasi-Monte Carlo (QMC) approximations.
Random Features for Large-Scale Kernel Machines
- Computer ScienceNIPS
- 2007
Two sets of random features are explored, provided convergence bounds on their ability to approximate various radial basis kernels, and it is shown that in large-scale classification and regression tasks linear machine learning algorithms applied to these features outperform state-of-the-art large- scale kernel machines.
Efficient Sparse Generalized Multiple Kernel Learning
- Computer ScienceIEEE Transactions on Neural Networks
- 2011
This paper proposes a generalized MKL model with a constraint on a linear combination of the -norm and the squared -norm on the kernel weights to seek the optimal kernel combination weights, which enjoys the favorable sparsity property on the solution and also facilitates the grouping effect.
On the Complexity of Learning with Kernels
- Computer ScienceCOLT
- 2015
There are kernel learning problems where no such method will lead to non-trivial computational savings, and lower bounds on the error attainable by such methods as a function of the number of entries observed in the kernel matrix or the rank of an approximate kernel matrix are studied.
Scalable Kernel Methods via Doubly Stochastic Gradients
- Computer ScienceNIPS
- 2014
An approach that scales up kernel methods using a novel concept called "doubly stochastic functional gradients" based on the fact that many kernel methods can be expressed as convex optimization problems, which can readily scale kernel methods up to the regimes which are dominated by neural nets.
On the Impact of Kernel Approximation on Learning Accuracy
- Computer ScienceAISTATS
- 2010
Stability bounds based on the norm of the kernel approximation for these algorithms, including SVMs, KRR, and graph Laplacian-based regularization algorithms, are given to determine the degree of approximation that can be tolerated in the estimation of thekernel matrix.
Nyström Method vs Random Fourier Features: A Theoretical and Empirical Comparison
- Computer ScienceNIPS
- 2012
It is shown that when there is a large gap in the eigen-spectrum of the kernel matrix, approaches based on the Nystrom method can yield impressively better generalization error bound than random Fourier features based approach.
Sparse Learning Under Regularization Framework
- Computer Science
- 2011
This thesis develops a novel online learning framework to solve group lasso and multi-task feature selection and proposes a generalized MKL (GMKL) model by introducing an elastic net-type constraint on the kernel weights to seek the optimal kernel combination weights.
Fastfood: Approximate Kernel Expansions in Loglinear Time
- Computer ScienceICML 2013
- 2013
Improvements to Fastfood, an approximation that accelerates kernel methods significantly and achieves similar accuracy to full kernel expansions and Random Kitchen Sinks while being 100x faster and using 1000x less memory, make kernel methods more practical for applications that have large training sets and/or require real-time prediction.