Optimal Rates for the Regularized Least-Squares Algorithm

@article{Caponnetto2007OptimalRF,
  title={Optimal Rates for the Regularized Least-Squares Algorithm},
  author={Andrea Caponnetto and Ernesto de Vito},
  journal={Foundations of Computational Mathematics},
  year={2007},
  volume={7},
  pages={331-368}
}
  • A. Caponnetto, E. D. Vito
  • Published 1 July 2007
  • Mathematics, Computer Science
  • Foundations of Computational Mathematics
We develop a theoretical analysis of the performance of the regularized least-square algorithm on a reproducing kernel Hilbert space in the supervised learning setting. The presented results hold in the general framework of vector-valued functions; therefore they can be applied to multitask problems. In particular, we observe that the concept of effective dimension plays a central role in the definition of a criterion for the choice of the regularization parameter as a function of the number of… 

Optimal Rates for Regularization Operators in Learning Theory

TLDR
It is shown that by a suitable choice of the regularization parameter as a function of the number of the available examples, it is possible to attain the optimal minimax rates of convergence for the expected squared loss of the estimators, over the family of priors fulfilling the constraint r + s > 1/2.

Optimal Rates for Spectral-regularized Algorithms with Least-Squares Regression over Hilbert Spaces

TLDR
This paper investigates a class of spectral-regularized algorithms, including ridge regression, principal component analysis, and gradient methods, and proves optimal, high-probability convergence results in terms of variants of norms for the studied algorithms, considering a capacity assumption on the hypothesis space and a general source condition on the target function.

Optimal Learning Rates for Regularized Least-Squares with a Fourier Capacity Condition

We derive minimax adaptive rates for a new, broad class of Tikhonov-regularized learning problems in Hilbert scales under general source conditions. Our analysis does not require the regression

Optimal rates for spectral algorithms with least-squares regression over Hilbert spaces

Adaptativity of Stochastic Gradient Descent

TLDR
In a stochastic approximation framework where the estimator is updated after each observation, it is shown that the averaged unregularized least-mean-square algorithm, given a sufficient large step-size, attains optimal rates of convergence for a variety of regimes for the smoothnesses of the optimal prediction function and the functions in H.

Optimal Rates for Regularization of Statistical Inverse Learning Problems

TLDR
Strong and weak minimax optimal rates of convergence (as the number of observations n grows large) for a large class of spectral regularization methods over regularity classes defined through appropriate source conditions are obtained.

Distributed Learning with Regularized Least Squares

TLDR
It is shown with error bounds in expectation that the global output function of this distributed learning with the least squares regularization scheme in a reproducing kernel Hilbert space is a good approximation to the algorithm processing the whole data in one single machine.

Optimal Rates of Sketched-regularized Algorithms for Least-Squares Regression over Hilbert Spaces

TLDR
These results are the first ones with optimal, distribution-dependent rates that do not have any saturation effect for sketched/Nystr\"{o}m regularized algorithms, considering both the attainable and non-attainable cases.

Parallelizing Spectrally Regularized Kernel Algorithms

TLDR
It is shown that minimax optimal rates of convergence are preserved if m grows sufficiently slowly (corresponding to an upper bound for α) as n→∞, depending on the smoothness assumptions on f and the intrinsic dimensionality.

Convergences of Regularized Algorithms and Stochastic Gradient Methods with Random Projections

TLDR
The least-squares regression problem over a Hilbert space is studied, covering nonparametric regression over a reproducing kernel Hilbert space as a special case, and optimal rates are obtained for regularized algorithms with randomized sketches, provided that the sketch dimension is proportional to the effective dimension up to a logarithmic factor.
...

References

SHOWING 1-10 OF 51 REFERENCES

Model Selection for Regularized Least-Squares Algorithm in Learning Theory

TLDR
Under suitable smoothness conditions on the regression function, the optimal parameter is estimated as a function of the number of data and it is proved that this choice ensures consistency of the algorithm.

On Learning Vector-Valued Functions

TLDR
This letter provides a study of learning in a Hilbert space of vector-valued functions and derives the form of the minimal norm interpolant to a finite set of data and applies it to study some regularization functionals that are important in learning theory.

Learning from Examples as an Inverse Problem

TLDR
A natural extension of analysis of Tikhonov regularization to the continuous (population) case and study the interplay between the discrete and continuous problems allows to draw a clear connection between the consistency approach in learning theory and the stability convergence property in ill-posed inverse problems.

Effective Dimension and Generalization of Kernel Learning

TLDR
A concept of scale-sensitive effective data dimension is introduced, and it is shown that it characterizes the convergence rate of the underlying learning problem, and can naturally extend results for parametric estimation problems in finite dimensional spaces to non-parametric kernel learning methods.

Risk Bounds for Regularized Least-squares Algorithm with Operator-valued kernels

Abstract : We show that recent results in [3] on risk bounds for regularized least-squares on reproducing kernel Hilbert spaces can be straight-forwardly extended to the vector-valued regression

Mathematical Methods for Supervised Learning

TLDR
The main focus is to understand what is the rate of approximation, measured either in expectation or probability, that can be obtained under a given prior fρ ∈ Θ, and what are possible algorithms for obtaining optimal or semi-optimal results.

Approximation in Learning Theory

This paper addresses some problems of supervised learning in the setting formulated by Cucker and Smale. Supervised learning, or learning-from-examples, refers to a process that builds on the base of

Learning Multiple Tasks with Kernel Methods

TLDR
The experiments show that learning multiple related tasks simultaneously using the proposed approach can significantly outperform standard single-task learning particularly when there are many related tasks but few data per task.

Extensions of a Theory of Networks for Approximation and Learning

TLDR
A theoretical framework for approximation based on regularization techniques that leads to a class of three-layer networks that is called Generalized Radial Basis Functions (GRBF), which is not only equivalent to generalized splines, but is closely related to several pattern recognition methods and neural network algorithms.
...