• Corpus ID: 211076290

Generalization Guarantees for Sparse Kernel Approximation with Entropic Optimal Features

@article{Ding2020GeneralizationGF,
  title={Generalization Guarantees for Sparse Kernel Approximation with Entropic Optimal Features},
  author={Liang Ding and Rui Tuo and Shahin Shahrampour},
  journal={ArXiv},
  year={2020},
  volume={abs/2002.04195}
}
Despite their success, kernel methods suffer from a massive computational cost in practice. In this paper, in lieu of commonly used kernel expansion with respect to $N$ inputs, we develop a novel optimal design maximizing the entropy among kernel features. This procedure results in a kernel expansion with respect to entropic optimal features (EOF), improving the data representation dramatically due to features dissimilarity. Under mild technical assumptions, our generalization bound shows that… 

Figures and Tables from this paper

Sample and Computationally Efficient Stochastic Kriging in High Dimensions
TLDR
This work develops a novel methodology that dramatically alleviates the curse of dimensionality, and demonstrates via extensive numerical experiments that the methodology can handle problems with a design space of more than 10,000 dimensions, improving both prediction accuracy and computational efficiency by orders of magnitude relative to typical alternative methods in practice.
Kernel Packet: An Exact and Scalable Algorithm for Gaussian Process Regression with Matérn Correlations
TLDR
An exact and scalable algorithm for one-dimensional Gaussian process regression with Matérn correlations with smoothness parameter 𝜈 is a half-integer, which is significantly superior to the existing alternatives in both the computational time and predictive accuracy.
High-Dimensional Simulation Optimization via Brownian Fields and Sparse Grids
TLDR
A new sampling algorithm is proposed that converges to a global optimal solution and suffers minimally from the curse of dimensionality, and dramatically outperforms typical alternatives in practice.
A Sparse Expansion For Deep Gaussian Processes
TLDR
The proposed DTMGP model has the following properties: the outputs of each activation function are deterministic while the weights are chosen independently from standard Gaussian distribution; in training or prediction, only O(polylog(M) activation functions have non-zero outputs, which significantly boosts the computational efficiency.
Sample and Computationally Efficient Simulation Metamodeling in High Dimensions
TLDR
This work develops a novel methodology that dramatically alleviates the curse of dimensionality, and demonstrates via extensive numerical experiments that the methodology can handle problems with a design space of hundreds of dimensions, improving both prediction accuracy and computational efficiency by orders of magnitude relative to typical alternative methods in practice.

References

SHOWING 1-10 OF 46 REFERENCES
Learning Bounds for Greedy Approximation with Explicit Feature Maps from Multiple Kernels
TLDR
The empirical results show that given a fixed number of explicit features, the method can achieve a lower test error with a smaller time cost, compared to the state-of-the-art in data-dependent random features.
Data-dependent compression of random features for large-scale kernel approximation
TLDR
This work proposes to combine the simplicity and generality of RFMs with a data-dependent feature selection scheme to achieve desirable theoretical approximation properties of Nystrom with just O(log J+) features, and shows that the method achieves small kernel matrix approximation error and better test set accuracy with provably fewer random features than state-of-the-art methods.
Scalable Learning in Reproducing Kernel Krein Spaces
We provide the first mathematically complete derivation of the Nystr\"om method for low-rank approximation of indefinite kernels and propose an efficient method for finding an approximate
Sparse Random Feature Algorithm as Coordinate Descent in Hilbert Space
TLDR
A Sparse Random Features algorithm, which learns a sparse non-linear predictor by minimizing an l1-regularized objective function over the Hilbert Space induced from a kernel function, which obtains a sparse solution that requires less memory and prediction time, while maintaining comparable performance on regression and classification tasks.
Non-parametric Group Orthogonal Matching Pursuit for Sparse Learning with Multiple Kernels
TLDR
This paper proposes a Group-OMP based framework for sparse MKL, which decouples the sparsity regularizer from the smoothness regularizer (via RKHS norms), which leads to better empirical performance and a simpler optimization procedure that only requires a black-box single-kernel solver.
Orthogonal Random Features
We present an intriguing discovery related to Random Fourier Features: replacing multiplication by a random Gaussian matrix with multiplication by a properly scaled random orthogonal matrix
On Data-Dependent Random Features for Improved Generalization in Supervised Learning
TLDR
This paper proposes the Energy-based Exploration of Random Features (EERF) algorithm based on a data-dependent score function that explores the set of possible features and exploits the promising regions and proves that the proposed score function with high probability recovers the spectrum of the best fit within the model class.
A General Scoring Rule for Randomized Kernel Approximation with Application to Canonical Correlation Analysis
TLDR
A general scoring rule for sampling random features, which can be employed for various applications with some adjustments and provides a principled guide for finding the distribution maximizing the canonical correlations, resulting in a novel data-dependent method for sampling features.
Bayesian Nonparametric Kernel-Learning
TLDR
Bayesian nonparmetric kernel-learning (BaNK), a generic, data-driven framework for scalable learning of kernels that places a nonparametric prior on the spectral distribution of random frequencies allowing it to both learn kernels and scale to large datasets.
Learning Kernels with Random Features
TLDR
This work presents an efficient optimization problem that learns a kernel in a supervised manner and proves the consistency of the estimated kernel as well as generalization bounds for the class of estimators induced by the optimized kernel.
...
...