• Corpus ID: 53109366

On Fast Leverage Score Sampling and Optimal Learning

@article{Rudi2018OnFL,
  title={On Fast Leverage Score Sampling and Optimal Learning},
  author={Alessandro Rudi and Daniele Calandriello and Luigi Carratino and Lorenzo Rosasco},
  journal={ArXiv},
  year={2018},
  volume={abs/1810.13258}
}
Leverage score sampling provides an appealing way to perform approximate computations for large matrices. Indeed, it allows to derive faithful approximations with a complexity adapted to the problem at hand. Yet, performing leverage scores sampling is a challenge in its own right requiring further approximations. In this paper, we study the problem of leverage score sampling for positive definite matrices defined by a kernel. Our contribution is twofold. First we provide a novel algorithm for… 

Figures and Tables from this paper

On Sampling Random Features From Empirical Leverage Scores: Implementation and Theoretical Guarantees

An out-of-sample performance bound is established on empirical leverage scores, revealing an interesting trade-off between the approximated kernel and the eigenvalue decay of another kernel in the domain of random features defined based on data distribution.

Fast Statistical Leverage Score Approximation in Kernel Ridge Regression

A linear time (modulo polylog terms) algorithm is proposed to accurately approximate the statistical leverage scores in the stationary-kernel-based KRR with theoretical guarantees and is orders of magnitude more efficient than existing methods in selecting the representative sub-samples in the Nyström approximation.

Fast Algorithms for Monotone Lower Subsets of Kronecker Least Squares Problems

This paper develops efficient leverage score-based sampling methods for matrices with certain Kronecker product-type structure, and numerical examples show that sketches based on exact leverage score sampling for a class of structured matrices achieve superior residual compared to approximate leverage scored sampling methods.

p-Sparsified Sketches for Fast Multiple Output Kernel Methods

This work derives excess risk bounds for both single and multiple output problems, with generic Lipschitz losses, providing new guarantees for a wide range of applications, from robust regression to multiple quantile regression and provides empirical evidences of the superiority of the sketches over recent SOTA approaches.

Learning with SGD and Random Features

This study highlights how different parameters, such as number of features, iterations, step-size and mini-batch size control the learning properties of the solutions by deriving optimal finite sample bounds, under standard assumptions.

Towards a Unified Analysis of Random Fourier Features

  • ZhuLi
  • Computer Science
  • 2019
This work provides the first unified risk analysis of learning with random Fourier features using the squared error and Lipschitz continuous loss functions and devise a simple approximation scheme which provably reduces the computational cost without loss of statistical efficiency.

Weighted Gradient Coding with Leverage Score Sampling

A novel weighted leverage score approach is presented, that achieves improved performance for distributed gradient coding by utilizing an importance sampling and provides a compressed approximation of a data matrix using an importance weighted subset.

Gain with no Pain: Efficiency of Kernel-PCA by Nyström Sampling

This analysis shows that Nyström sampling greatly improves computational efficiency without incurring any loss of statistical accuracy in kernel PCA, the first such result for PCA.

Leverage Score Sampling for Complete Mode Coverage in Generative Adversarial Networks

This work proposes a sampling procedure based on ridge leverage scores which significantly improves mode coverage when compared to standard methods and can easily be combined with any GAN.

Towards a Unified Analysis of Random Fourier Features

This work provides the first unified risk analysis of learning with random Fourier features using the squared error and Lipschitz continuous loss functions and devise a simple approximation scheme which provably reduces the computational cost without loss of statistical efficiency.
...

References

SHOWING 1-10 OF 30 REFERENCES

Recursive Sampling for the Nystrom Method

We give the first algorithm for kernel Nystrom approximation that runs in linear time in the number of training points and is provably accurate for all kernel matrices, without dependence on

Fast approximation of matrix coherence and statistical leverage

A randomized algorithm is proposed that takes as input an arbitrary n × d matrix A, with n ≫ d, and returns, as output, relative-error approximations to all n of the statistical leverage scores.

Fast Randomized Kernel Methods With Statistical Guarantees

A version of this approach that comes with running time guarantees as well as improved guarantees on its statistical performance is described, and a new notion of the statistical leverage of a data point captures in a fine way the difficulty of the original statistical learning problem.

Learning with SGD and Random Features

This study highlights how different parameters, such as number of features, iterations, step-size and mini-batch size control the learning properties of the solutions by deriving optimal finite sample bounds, under standard assumptions.

FALKON: An Optimal Large Scale Kernel Method

This paper proposes FALKON, a novel algorithm that allows to efficiently process millions of points, derived combining several algorithmic principles, namely stochastic subsampling, iterative solvers and preconditioning.

Less is More: Nyström Computational Regularization

A simple incremental variant of Nystrom Kernel Regularized Least Squares is suggested, where the subsampling level implements a form of computational regularization, in the sense that it controls at the same time regularization and computations.

A Structured Prediction Approach for Label Ranking

We propose to solve a label ranking problem as a structured output regression task. In this view, we adopt a least square surrogate loss approach that solves a supervised learning problem in two

Efficient Second-Order Online Kernel Learning with Adaptive Embedding

This paper proposes PROS-N-KONS, a method that combines Nystrom sketching to project the input point in a small, accurate embedded space, and performs efficient second-order updates in this space and achieves the logarithmic regret.

Optimal Rates for Regularized Least Squares Regression

A new oracle inequality is established for kernelbased, regularized least squares regression methods, which uses the eigenvalues of the associated integral operator as a complexity measure and it turns out that these rates are independent of the exponent of the regularization term.

Sharp analysis of low-rank kernel matrix approximations

This paper shows that in the context of kernel ridge regression, for approximations based on a random subset of columns of the original kernel matrix, the rank p may be chosen to be linear in the degrees of freedom associated with the problem, a quantity which is classically used in the statistical analysis of such methods.