• Corpus ID: 384343

Less is More: Nyström Computational Regularization

  title={Less is More: Nystr{\"o}m Computational Regularization},
  author={Alessandro Rudi and Raffaello Camoriano and Lorenzo Rosasco},
We study Nystrom type subsampling approaches to large scale kernel methods, and prove learning bounds in the statistical learning setting, where random sampling and high probability estimates are considered. In particular, we prove that these approaches can achieve optimal learning bounds, provided the subsampling level is suitably chosen. These results suggest a simple incremental variant of Nystrom Kernel Regularized Least Squares, where the subsampling level implements a form of… 

Figures and Tables from this paper

Nyström type subsampling analyzed as a regularized projection

In the statistical learning theory the Nyström type subsampling methods are considered as tools for dealing with big data. In this paper we consider Nyström subsampling as a special form of the

Analysis of regularized Nyström subsampling for regression functions of low smoothness

This paper studies a Nyström-type subsampling approach to large kernel learning methods in the misspecified case, where the target function is not assumed to belong to the reproducing kernel Hilbert

NYTRO: When Subsampling Meets Early Stopping

This paper considers the question in a least squares regression setting and proposes a form of randomized iterative regularization based on early stopping and subsampling, which is complemented and validated by a thorough experimental analysis.

Optimal Rates for Learning with Nyström Stochastic Gradient Methods

The results indicate that using mini-batches can reduce the total computational cost while achieving the same optimal statistical results, and improve the computational complexity of the studied algorithm.

Recursive Sampling for the Nystrom Method

We give the first algorithm for kernel Nystrom approximation that runs in linear time in the number of training points and is provably accurate for all kernel matrices, without dependence on

Sharp Theoretical Analysis for Nonparametric Testing under Random Projection

This paper develops computationally efficient nonparametric testing by employing a random projection strategy in the specific kernel ridge regression setup and derives the minimum number of random projections that is sufficient for achieving testing optimality in terms of the minimax rate.

Nyström Kernel Mean Embeddings

An upper bound on the approximation error of the Nyström method is found, which yields sufficient conditions on the subsample size to obtain the standard 𝑛 −1/2 rate while reducing computational costs.

Gain with no Pain: Efficient Kernel-PCA by Nyström Sampling

This analysis shows that Nystrom sampling greatly improves computational efficiency without incurring any loss of statistical accuracy in PCA, the first such result for PCA.

O ct 2 01 7 Manifold regularization based on Nyström type subsampling

A theoretical analysis of multi-penalty least-square regularization scheme under the general source condition in vector-valued function setting is developed and the results can also be applied to multi-task learning problems.



Ensemble Nystrom Method

A new family of algorithms based on mixtures of Nystrom approximation, ensemble Nystrom algorithms, that yield more accurate low-rank approximations than the standard Nystrom method are introduced.

Optimal Rates for the Regularized Least-Squares Algorithm

A complete minimax analysis of the problem is described, showing that the convergence rates obtained by regularized least-squares estimators are indeed optimal over a suitable class of priors defined by the considered kernel.

Improved Nyström low-rank approximation and error analysis

An error analysis that directly relates the Nyström approximation quality with the encoding powers of the landmark points in summarizing the data is presented, and the resultant error bound suggests a simple and efficient sampling scheme, the k-means clustering algorithm, for NyStröm low-rank approximation.

Revisiting the Nystrom Method for Improved Large-scale Machine Learning

An empirical evaluation of the performance quality and running time of sampling and projection methods on a diverse suite of SPSD matrices and a suite of worst-case theoretical bounds for both random sampling and random projection methods are complemented.

Online Gradient Descent Learning Algorithms

It is shown that, although the algorithm does not involve an explicit RKHS regularization term, choosing the step sizes appropriately can yield competitive error rates with those in the literature.

Cross-validation based adaptation for regularization operators in learning theory

Results prove adaptation of the rate of convergence of the estimators to the minimax rate induced by the "effective dimension" of the problem and show universal consistency for this broad class of methods which includes regularized least-squares, truncated SVD, Landweber iteration and ν-method.

Sampling Methods for the Nyström Method

This work reports results of extensive experiments that provide a detailed comparison of various fixed and adaptive sampling techniques, and demonstrates the performance improvement associated with the ensemble Nystrom method when used in conjunction with either fixed or adaptive sampling schemes.

Scalable Kernel Methods via Doubly Stochastic Gradients

An approach that scales up kernel methods using a novel concept called "doubly stochastic functional gradients" based on the fact that many kernel methods can be expressed as convex optimization problems, which can readily scale kernel methods up to the regimes which are dominated by neural nets.

Divide and Conquer Kernel Ridge Regression

The main theorem establishes that despite the computational speed-up, statistical optimality is retained: if m is not too large, the partition-based estimate achieves optimal rates of convergence for the full sample size N.

A novel greedy algorithm for Nyström approximation

A novel recursive algorithm for calculating the Nystrom approximation, and an effective greedy criterion for column selection are presented, and a very efficient variant is proposed for greedy sampling, which works on random partitions of data instances.