# Less is More: Nyström Computational Regularization

@inproceedings{Rudi2015LessIM, title={Less is More: Nystr{\"o}m Computational Regularization}, author={Alessandro Rudi and Raffaello Camoriano and Lorenzo Rosasco}, booktitle={NIPS}, year={2015} }

We study Nystrom type subsampling approaches to large scale kernel methods, and prove learning bounds in the statistical learning setting, where random sampling and high probability estimates are considered. In particular, we prove that these approaches can achieve optimal learning bounds, provided the subsampling level is suitably chosen. These results suggest a simple incremental variant of Nystrom Kernel Regularized Least Squares, where the subsampling level implements a form of…

## 212 Citations

### Nyström type subsampling analyzed as a regularized projection

- Mathematics
- 2017

In the statistical learning theory the Nyström type subsampling methods are considered as tools for dealing with big data. In this paper we consider Nyström subsampling as a special form of the…

### Analysis of regularized Nyström subsampling for regression functions of low smoothness

- Computer ScienceAnalysis and Applications
- 2019

This paper studies a Nyström-type subsampling approach to large kernel learning methods in the misspecified case, where the target function is not assumed to belong to the reproducing kernel Hilbert…

### NYTRO: When Subsampling Meets Early Stopping

- Computer ScienceAISTATS
- 2016

This paper considers the question in a least squares regression setting and proposes a form of randomized iterative regularization based on early stopping and subsampling, which is complemented and validated by a thorough experimental analysis.

### Optimal Rates for Learning with Nyström Stochastic Gradient Methods

- Computer ScienceArXiv
- 2017

The results indicate that using mini-batches can reduce the total computational cost while achieving the same optimal statistical results, and improve the computational complexity of the studied algorithm.

### Recursive Sampling for the Nystrom Method

- Computer ScienceNIPS
- 2017

We give the first algorithm for kernel Nystrom approximation that runs in linear time in the number of training points and is provably accurate for all kernel matrices, without dependence on…

### Sharp Theoretical Analysis for Nonparametric Testing under Random Projection

- Computer ScienceCOLT
- 2019

This paper develops computationally efficient nonparametric testing by employing a random projection strategy in the specific kernel ridge regression setup and derives the minimum number of random projections that is sufficient for achieving testing optimality in terms of the minimax rate.

### Nyström Kernel Mean Embeddings

- Computer Science, MathematicsICML
- 2022

An upper bound on the approximation error of the Nyström method is found, which yields sufficient conditions on the subsample size to obtain the standard 𝑛 −1/2 rate while reducing computational costs.

### Gain with no Pain: Efficient Kernel-PCA by Nyström Sampling

- Computer ScienceArXiv
- 2019

This analysis shows that Nystrom sampling greatly improves computational efficiency without incurring any loss of statistical accuracy in PCA, the first such result for PCA.

### O ct 2 01 7 Manifold regularization based on Nyström type subsampling

- Computer Science
- 2017

A theoretical analysis of multi-penalty least-square regularization scheme under the general source condition in vector-valued function setting is developed and the results can also be applied to multi-task learning problems.

## References

SHOWING 1-10 OF 45 REFERENCES

### Ensemble Nystrom Method

- Computer ScienceNIPS
- 2009

A new family of algorithms based on mixtures of Nystrom approximation, ensemble Nystrom algorithms, that yield more accurate low-rank approximations than the standard Nystrom method are introduced.

### Optimal Rates for the Regularized Least-Squares Algorithm

- Mathematics, Computer ScienceFound. Comput. Math.
- 2007

A complete minimax analysis of the problem is described, showing that the convergence rates obtained by regularized least-squares estimators are indeed optimal over a suitable class of priors defined by the considered kernel.

### Improved Nyström low-rank approximation and error analysis

- Computer ScienceICML '08
- 2008

An error analysis that directly relates the Nyström approximation quality with the encoding powers of the landmark points in summarizing the data is presented, and the resultant error bound suggests a simple and efficient sampling scheme, the k-means clustering algorithm, for NyStröm low-rank approximation.

### Revisiting the Nystrom Method for Improved Large-scale Machine Learning

- Computer ScienceJ. Mach. Learn. Res.
- 2016

An empirical evaluation of the performance quality and running time of sampling and projection methods on a diverse suite of SPSD matrices and a suite of worst-case theoretical bounds for both random sampling and random projection methods are complemented.

### Online Gradient Descent Learning Algorithms

- Computer ScienceFound. Comput. Math.
- 2008

It is shown that, although the algorithm does not involve an explicit RKHS regularization term, choosing the step sizes appropriately can yield competitive error rates with those in the literature.

### Cross-validation based adaptation for regularization operators in learning theory

- Mathematics, Computer Science
- 2010

Results prove adaptation of the rate of convergence of the estimators to the minimax rate induced by the "effective dimension" of the problem and show universal consistency for this broad class of methods which includes regularized least-squares, truncated SVD, Landweber iteration and ν-method.

### Sampling Methods for the Nyström Method

- Computer ScienceJ. Mach. Learn. Res.
- 2012

This work reports results of extensive experiments that provide a detailed comparison of various fixed and adaptive sampling techniques, and demonstrates the performance improvement associated with the ensemble Nystrom method when used in conjunction with either fixed or adaptive sampling schemes.

### Scalable Kernel Methods via Doubly Stochastic Gradients

- Computer ScienceNIPS
- 2014

An approach that scales up kernel methods using a novel concept called "doubly stochastic functional gradients" based on the fact that many kernel methods can be expressed as convex optimization problems, which can readily scale kernel methods up to the regimes which are dominated by neural nets.

### Divide and Conquer Kernel Ridge Regression

- Computer Science, MathematicsCOLT
- 2013

The main theorem establishes that despite the computational speed-up, statistical optimality is retained: if m is not too large, the partition-based estimate achieves optimal rates of convergence for the full sample size N.

### A novel greedy algorithm for Nyström approximation

- Computer ScienceAISTATS
- 2011

A novel recursive algorithm for calculating the Nystrom approximation, and an effective greedy criterion for column selection are presented, and a very efficient variant is proposed for greedy sampling, which works on random partitions of data instances.