# Generalization Guarantees for Sparse Kernel Approximation with Entropic Optimal Features

@article{Ding2020GeneralizationGF, title={Generalization Guarantees for Sparse Kernel Approximation with Entropic Optimal Features}, author={Liang Ding and Rui Tuo and Shahin Shahrampour}, journal={ArXiv}, year={2020}, volume={abs/2002.04195} }

Despite their success, kernel methods suffer from a massive computational cost in practice. In this paper, in lieu of commonly used kernel expansion with respect to $N$ inputs, we develop a novel optimal design maximizing the entropy among kernel features. This procedure results in a kernel expansion with respect to entropic optimal features (EOF), improving the data representation dramatically due to features dissimilarity. Under mild technical assumptions, our generalization bound shows that…

## 5 Citations

Sample and Computationally Efficient Stochastic Kriging in High Dimensions

- Computer Science
- 2020

This work develops a novel methodology that dramatically alleviates the curse of dimensionality, and demonstrates via extensive numerical experiments that the methodology can handle problems with a design space of more than 10,000 dimensions, improving both prediction accuracy and computational efficiency by orders of magnitude relative to typical alternative methods in practice.

Kernel Packet: An Exact and Scalable Algorithm for Gaussian Process Regression with Matérn Correlations

- Computer ScienceArXiv
- 2022

An exact and scalable algorithm for one-dimensional Gaussian process regression with Matérn correlations with smoothness parameter 𝜈 is a half-integer, which is significantly superior to the existing alternatives in both the computational time and predictive accuracy.

High-Dimensional Simulation Optimization via Brownian Fields and Sparse Grids

- Computer ScienceArXiv
- 2021

A new sampling algorithm is proposed that converges to a global optimal solution and suffers minimally from the curse of dimensionality, and dramatically outperforms typical alternatives in practice.

A Sparse Expansion For Deep Gaussian Processes

- Computer ScienceArXiv
- 2021

The proposed DTMGP model has the following properties: the outputs of each activation function are deterministic while the weights are chosen independently from standard Gaussian distribution; in training or prediction, only O(polylog(M) activation functions have non-zero outputs, which significantly boosts the computational efficiency.

Sample and Computationally Efficient Simulation Metamodeling in High Dimensions

- Computer Science
- 2020

This work develops a novel methodology that dramatically alleviates the curse of dimensionality, and demonstrates via extensive numerical experiments that the methodology can handle problems with a design space of hundreds of dimensions, improving both prediction accuracy and computational efficiency by orders of magnitude relative to typical alternative methods in practice.

## References

SHOWING 1-10 OF 46 REFERENCES

Learning Bounds for Greedy Approximation with Explicit Feature Maps from Multiple Kernels

- Computer ScienceNeurIPS
- 2018

The empirical results show that given a fixed number of explicit features, the method can achieve a lower test error with a smaller time cost, compared to the state-of-the-art in data-dependent random features.

Data-dependent compression of random features for large-scale kernel approximation

- Computer ScienceAISTATS
- 2019

This work proposes to combine the simplicity and generality of RFMs with a data-dependent feature selection scheme to achieve desirable theoretical approximation properties of Nystrom with just O(log J+) features, and shows that the method achieves small kernel matrix approximation error and better test set accuracy with provably fewer random features than state-of-the-art methods.

Scalable Learning in Reproducing Kernel Krein Spaces

- Computer ScienceICML
- 2019

We provide the first mathematically complete derivation of the Nystr\"om method for low-rank approximation of indefinite kernels and propose an efficient method for finding an approximate…

Sparse Random Feature Algorithm as Coordinate Descent in Hilbert Space

- Computer ScienceNIPS
- 2014

A Sparse Random Features algorithm, which learns a sparse non-linear predictor by minimizing an l1-regularized objective function over the Hilbert Space induced from a kernel function, which obtains a sparse solution that requires less memory and prediction time, while maintaining comparable performance on regression and classification tasks.

Non-parametric Group Orthogonal Matching Pursuit for Sparse Learning with Multiple Kernels

- Computer ScienceNIPS
- 2011

This paper proposes a Group-OMP based framework for sparse MKL, which decouples the sparsity regularizer from the smoothness regularizer (via RKHS norms), which leads to better empirical performance and a simpler optimization procedure that only requires a black-box single-kernel solver.

Orthogonal Random Features

- Computer ScienceNIPS
- 2016

We present an intriguing discovery related to Random Fourier Features: replacing multiplication by a random Gaussian matrix with multiplication by a properly scaled random orthogonal matrix…

On Data-Dependent Random Features for Improved Generalization in Supervised Learning

- Computer ScienceAAAI
- 2018

This paper proposes the Energy-based Exploration of Random Features (EERF) algorithm based on a data-dependent score function that explores the set of possible features and exploits the promising regions and proves that the proposed score function with high probability recovers the spectrum of the best fit within the model class.

A General Scoring Rule for Randomized Kernel Approximation with Application to Canonical Correlation Analysis

- Computer ScienceArXiv
- 2019

A general scoring rule for sampling random features, which can be employed for various applications with some adjustments and provides a principled guide for finding the distribution maximizing the canonical correlations, resulting in a novel data-dependent method for sampling features.

Bayesian Nonparametric Kernel-Learning

- Computer ScienceAISTATS
- 2016

Bayesian nonparmetric kernel-learning (BaNK), a generic, data-driven framework for scalable learning of kernels that places a nonparametric prior on the spectral distribution of random frequencies allowing it to both learn kernels and scale to large datasets.

Learning Kernels with Random Features

- Computer ScienceNIPS
- 2016

This work presents an efficient optimization problem that learns a kernel in a supervised manner and proves the consistency of the estimated kernel as well as generalization bounds for the class of estimators induced by the optimized kernel.