Generalization bounds for sparse random feature expansions

@article{Hashemi2022GeneralizationBF,
  title={Generalization bounds for sparse random feature expansions},
  author={Abolfazl Hashemi and Hayden Schaeffer and Robert Shi and Ufuk Topcu and Giang Tran and Rachel A. Ward},
  journal={Applied and Computational Harmonic Analysis},
  year={2022}
}

Figures and Tables from this paper

Conditioning of Random Feature Matrices: Double Descent and Generalization Error

TLDR
It is proved that the risk associated with regression problems using a random feature matrix exhibits thedouble descent phenomenon and that this is an effect of the double descent behavior of the condition number.

HARFE: Hard-Ridge Random Feature Expansion

TLDR
It is proved that the HARFE method is guaranteed to converge with a given error bound depending on the noise and the parameters of the sparse ridge regression model.

SHRIMP: Sparser Random Feature Models via Iterative Magnitude Pruning

TLDR
It is shown that SHRIMP obtains better than or competitive test accuracy compared to state-of-art sparse feature and additive methods such as SRFE-S, SSAM, and SALSA and is robust to the pruning rate, indicating a robustness in the structure of the obtained subnetworks.

Towards optimal sampling for learning sparse approximation in high dimensions

TLDR
This chapter discusses recent work on learning sparse approximations to high-dimensional functions on data, where the target functions may be scalar-, vectoror even Hilbert space-valued, and describes a general construction of sampling measures that improves over standard Monte Carlo sampling.

Concentration of Random Feature Matrices in High-Dimensions

TLDR
This work shows that the singular values of random feature matrices concentrate near their full expectation and near one with high-probability, and since the dimension depends only on the logarithm of the number of random weights or thenumber of data points, complexity bounds can be achieved even in moderate dimensions for many practical setting.

Learning Sparse Mixture Models

  • F. Ba
  • Computer Science
  • 2022
TLDR
This work approximates high-dimensional density functions with an ANOVA-like sparse structure by the mixture of wrapped Gaussian and von Mises distributions by developing an algorithm that determines the mixture model’s set of active variables by the Kolmogorov-Smirnov and correlation test.

Sparse mixture models inspired by ANOVA decompositions

Inspired by the analysis of variance (ANOVA) decomposition of functions, we propose a Gaussianuniform mixture model on the high-dimensional torus which relies on the assumption that the function that

Renormalized Sparse Neural Network Pruning

TLDR
It is proven that the renormalizing sparse neural networks method’s error converges to zero as network parameters cluster or concentrate, and it is proved that without renormalization, the error does not converge to zero in general.

Structured random receptive fields enable informative sensory encodings

TLDR
This work model neuronal receptive fields as random, variable samples from parameterized distributions and demonstrates this model in two sensory modalities using data from insect mechanosensors and mammalian primary visual cortex, leading to a significant theoretical connection between the foundational concepts of receptive fields and random features, a leading theory for understanding artificial neural networks.

SRMD: Sparse Random Mode Decomposition

TLDR
This work proposed a random feature method for analyzing time-series data by constructing a sparse approximation to the spectrogram that outperforms other state-of-the-art decomposition methods.

References

SHOWING 1-10 OF 65 REFERENCES

Sparse Random Feature Algorithm as Coordinate Descent in Hilbert Space

TLDR
A Sparse Random Features algorithm, which learns a sparse non-linear predictor by minimizing an l1-regularized objective function over the Hilbert Space induced from a kernel function, which obtains a sparse solution that requires less memory and prediction time, while maintaining comparable performance on regression and classification tasks.

Optimal Rates for Random Fourier Features

TLDR
A detailed finite-sample theoretical analysis about the approximation quality of RFFs is provided by establishing optimal (in terms of the RFF dimension, and growing set size) performance guarantees in uniform norm, and presenting guarantees in Lr (1 ≤ r < ∞) norms.

Towards a Unified Analysis of Random Fourier Features

TLDR
This work provides the first unified risk analysis of learning with random Fourier features using the squared error and Lipschitz continuous loss functions and devise a simple approximation scheme which provably reduces the computational cost without loss of statistical efficiency.

On the Error of Random Fourier Features

TLDR
The uniform error bound of that paper on random Fourier features is improved, as well as giving novel understandings of the embedding's variance, approximation error, and use in some machine learning methods.

Generalization Properties of Learning with Random Features

TLDR
The results shed light on the statistical computational trade-offs in large scale kernelized learning, showing the potential effectiveness of random features in reducing the computational complexity while keeping optimal generalization properties.

Uniform approximation of functions with random bases

  • A. RahimiB. Recht
  • Computer Science
    2008 46th Annual Allerton Conference on Communication, Control, and Computing
  • 2008
TLDR
Using techniques from probability on Banach Spaces, a specific architecture of random nonlinearities is analyzed, Linfin and L2 error bounds for approximating functions in Reproducing Kernel Hilbert Spaces are provided and scenarios when these expansions are dense in the continuous functions are discussed.

Group Sparse Additive Machine

TLDR
The new bound shows that GroupSAM can achieve a satisfactory learning rate with polynomial decay and Generalization error bound is derived and proved by integrating the sample error analysis with empirical covering numbers and the hypothesis error estimate with the stepping stone technique.

Extracting structured dynamical systems using sparse optimization with very few samples

TLDR
A random sampling method for learning structured dynamical systems from under-sampled and possibly noisy state-space measurements based on a Bernstein-like inequality for partly dependent random variables and theoretical guarantees on the recovery rate of the sparse coefficients and the identification of the candidate functions for the corresponding problem.

Polynomial approximation via compressed sensing of high-dimensional functions on lower sets

TLDR
This work proposes and analyzes a compressed sensing approach to polynomial approximation of complex-valued functions in high dimensions, and presents an innovative weighted $\ell_1$ minimization procedure with a precise choice of weights, and a new iterative hard thresholding method, for imposing the downward closed preference.

Interpolation via weighted $l_1$ minimization

...