Random Features for Kernel Approximation: A Survey in Algorithms, Theory, and Beyond

@article{Liu2021RandomFF,
  title={Random Features for Kernel Approximation: A Survey in Algorithms, Theory, and Beyond},
  author={F. Liu and Xiaolin Huang and Yudong Chen and J. Suykens},
  journal={IEEE transactions on pattern analysis and machine intelligence},
  year={2021},
  volume={PP}
}
  • F. Liu, Xiaolin Huang, +1 author J. Suykens
  • Published 2021
  • Medicine, Mathematics, Computer Science
  • IEEE transactions on pattern analysis and machine intelligence
Random features is one of the most popular techniques to speed up kernel methods in large-scale problems. Related works have been recognized by the NeurIPS Test-of-Time award in 2017 and the ICML Best Paper Finalist in 2019. The body of work on random features has grown rapidly, and hence it is desirable to have a comprehensive overview on this topic explaining the connections among various algorithms and theoretical results. In this survey, we systematically review the work on random features… Expand
On the Approximation Lower Bound for Neural Nets with Random Weights
TLDR
It is shown that, despite the well-known fact that a shallow neural network is a universal approximator, a random net cannot achieve zero approximation error even for smooth functions, and it is proved that if the proposal distribution is compactly supported, then a lower bound is positive. Expand
Fast Learning in Reproducing Kernel Krein Spaces via Generalized Measures
In this paper, we attempt to solve a long-lasting open question in non-positive definite (non-PD) kernels: does a given non-PD kernel can be decomposed into the difference of two PD kernels (termedExpand
Kernel approximation on algebraic varieties
TLDR
The main technical insight is to approximate smooth kernels by polynomial kernels, and leverage two key properties of polynometric kernels that hold when they are restricted to a variety. Expand
Learning Data-adaptive Nonparametric Kernels
TLDR
A Data-Adaptive Nonparametric Kernel (DANK) learning framework by imposing an adaptive matrix on the kernel/Gram matrix in an entry-wise strategy that outperforms other representative kernel learning based algorithms on various classification and regression benchmark datasets. Expand
Global Convergence and Induced Kernels of Gradient-Based Meta-Learning with Neural Nets
TLDR
It is proved that GBML is equivalent to a functional gradient descent operation that explicitly propagates experience from the past tasks to new ones and a new kernel-based meta-learning approach is developed that outperforms GBML with standard DNNs on the Omniglot dataset when the number of past tasks for meta-training is small. Expand
Shallow Representation is Deep: Learning Uncertainty-aware and Worst-case Random Feature Dynamics
TLDR
It is shown that finding worst-case dynamics realizations using Pontryagin’s minimum principle is equivalent to performing the Frank-Wolfe algorithm on the deep net, and the whole dynamical system is viewed as a multi-layer neural network. Expand
Sample and Computationally Efficient Simulation Metamodeling in High Dimensions
TLDR
This work develops a novel methodology that dramatically alleviates the curse of dimensionality, and demonstrates via extensive numerical experiments that the methodology can handle problems with a design space of hundreds of dimensions, improving both prediction accuracy and computational efficiency by orders of magnitude relative to typical alternative methods in practice. Expand
An Insect-Inspired Randomly, Weighted Neural Network with Random Fourier Features For Neuro-Symbolic Relational Learning
The computer-science field of Knowledge Representation and Reasoning (KRR) aims to understand, reason, and interpret knowledge as efficiently as human beings do. Because many logical formalisms andExpand
Fast Learning in Reproducing Kernel Krein Spaces via Signed Measures
TLDR
This paper casts this question as a distribution view by introducing the signed measure, which transforms positive decomposition to measure decomposition: a series of non-PD kernels can be associated with the linear combination of specific finite Borel measures and provides a sufficient and necessary condition to answer this open question. Expand
Kernel regression in high dimension: Refined analysis beyond double descent
TLDR
This refined analysis goes beyond the double descent theory by showing that, depending on the data eigen-profile and the level of regularization, the kernel regression risk curve can be a double-descent-like, bell-shaped, or monotonic function of $n$. Expand
...
1
2
...

References

SHOWING 1-10 OF 213 REFERENCES
On Data-Dependent Random Features for Improved Generalization in Supervised Learning
TLDR
This paper proposes the Energy-based Exploration of Random Features (EERF) algorithm based on a data-dependent score function that explores the set of possible features and exploits the promising regions and proves that the proposed score function with high probability recovers the spectrum of the best fit within the model class. Expand
A General Scoring Rule for Randomized Kernel Approximation with Application to Canonical Correlation Analysis
TLDR
A general scoring rule for sampling random features, which can be employed for various applications with some adjustments and provides a principled guide for finding the distribution maximizing the canonical correlations, resulting in a novel data-dependent method for sampling features. Expand
Random Features for Shift-Invariant Kernels with Moment Matching
TLDR
This paper presents a novel sampling algorithm powered by moment matching techniques to reduce the variance of random features and proves the superiority of the proposed algorithm in Gram matrix approximation and generalization errors in regression. Expand
Optimal Rates for Random Fourier Features
TLDR
A detailed finite-sample theoretical analysis about the approximation quality of RFFs is provided by establishing optimal (in terms of the RFF dimension, and growing set size) performance guarantees in uniform norm, and presenting guarantees in Lr (1 ≤ r < ∞) norms. Expand
Data-driven Random Fourier Features using Stein Effect
TLDR
A novel shrinkage estimator from "Stein effect", which provides a data-driven weighting strategy for random features and enjoys theoretical justifications in terms of lowering the empirical risk and an efficient randomized algorithm for large-scale applications of the proposed method are presented. Expand
Towards a Unified Analysis of Random Fourier Features
TLDR
This work provides the first unified risk analysis of learning with random Fourier features using the squared error and Lipschitz continuous loss functions and devise a simple approximation scheme which provably reduces the computational cost without loss of statistical efficiency. Expand
Data-dependent compression of random features for large-scale kernel approximation
TLDR
This work proposes to combine the simplicity and generality of RFMs with a data-dependent feature selection scheme to achieve desirable theoretical approximation properties of Nystrom with just O(log J+) features, and shows that the method achieves small kernel matrix approximation error and better test set accuracy with provably fewer random features than state-of-the-art methods. Expand
Random Fourier Features via Fast Surrogate Leverage Weighted Sampling
TLDR
A fast surrogate leverage weighted sampling strategy to generate refined random Fourier features for kernel approximation and provides theoretical guarantees on the generalization performance of this approach, and in particular characterize the number of random features required to achieve statistical guarantees in KRR. Expand
On the Error of Random Fourier Features
TLDR
The uniform error bound of that paper on random Fourier features is improved, as well as giving novel understandings of the embedding's variance, approximation error, and use in some machine learning methods. Expand
Simple and Almost Assumption-Free Out-of-Sample Bound for Random Feature Mapping
TLDR
This paper studies kernel ridge regression with random feature mapping (RFM-KRR) and establishes novel out-of-sample error upper and lower bounds and is completely based on elementary linear algebra and thereby easy to read and verify. Expand
...
1
2
3
4
5
...