# Learning Theory for Distribution Regression

@article{Szab2014LearningTF, title={Learning Theory for Distribution Regression}, author={Zolt{\'a}n Szab{\'o} and Bharath K. Sriperumbudur and Barnab{\'a}s P{\'o}czos and Arthur Gretton}, journal={J. Mach. Learn. Res.}, year={2014}, volume={17}, pages={152:1-152:40} }

We focus on the distribution regression problem: regressing to vector-valued outputs from probability measures. Many important machine learning and statistical tasks fit into this framework, including multi-instance learning and point estimation problems without analytical solution (such as hyperparameter or entropy estimation). Despite the large number of available heuristics in the literature, the inherent two-stage sampled nature of the problem makes the theoretical analysis quite…

## 104 Citations

### Distribution Regression with Sliced Wasserstein Kernels

- Computer ScienceICML
- 2022

The theoretical properties of a kernel ridge regression estimator based on an OT-based representation are studied, for which it is proved universal consistency and excess risk bounds are proved.

### Robust kernel-based distribution regression

- Mathematics, Computer ScienceInverse Problems
- 2021

By introducing a robust loss function lσ for two-stage sampling problems, this paper presents a novel robust distribution regression (RDR) scheme with a windowing function V and a scaling parameter σ that is shown to be crucial in providing robustness and satisfactory learning rates of RDR.

### Coefficient-based Regularized Distribution Regression

- Computer Science, MathematicsArXiv
- 2022

The algorithm under consideration provides a simple paradigm for designing indeﬁnite kernel methods, which enriches the theme of the distribution regression, and gets the optimal rates under some mild conditions, which matches the one-stage sampled minimax optimal rate.

### Estimates on Learning Rates for Multi-Penalty Distribution Regression

- Computer ScienceArXiv
- 2020

A novel multi-penalty regularization algorithm to capture more features of distribution regression and derive optimal learning rates for the algorithm is presented and a distributed learning algorithm to face large-scale data or information challenge is proposed.

### Wasserstein Regression*

- MathematicsJournal of the American Statistical Association
- 2021

The analysis of samples of random objects that do not lie in a vector space has found increasing attention in statistics in recent years. An important class of such object data is univariate…

### Optimal learning rates for distribution regression

- Computer Science, MathematicsJ. Complex.
- 2020

### Proposal : Scalable , Active and Flexible Learning on Distributions

- Computer Science
- 2015

This thesis investigates the approach of approximate embeddings into Euclidean spaces such that inner products in the embedding space approximate kernel values between the source distributions and proposes to adapt recent kernel learning techniques to the distributional setting, allowing the automatic selection of good kernels for the task at hand.

### Wasserstein Regression these are either Nadaraya – Watson type estimators that suffer from a severe curse of dimensionality

- Mathematics
- 2021

The analysis of samples of random objects that do not lie in a vector space is gaining increasing attention in statistics. An important class of such object data is univariate probability measures…

### Optimal Rates of Distributed Regression with Imperfect Kernels

- Computer ScienceJ. Mach. Learn. Res.
- 2021

This paper establishes a general framework that allows to analyze distributed regression with response weighted base algorithms by bounding the error of such algorithms on a single data set, provided that the error bounds has factored the impact of the unexplained variance of the response variable.

### Bayesian Distribution Regression

- Computer ScienceArXiv
- 2017

This work constructs a Bayesian distribution regression formalism that accounts for uncertainty in observations due to sampling variability in groups, improving the robustness and performance of the model when group sizes vary.

## References

SHOWING 1-10 OF 118 REFERENCES

### Linear-Time Learning on Distributions with Approximate Kernel Embeddings

- Computer ScienceAAAI
- 2016

This work develops the first random features for pdfs whose dot product approximates kernels using these non-Euclidean metrics, allowing estimators to scale to large datasets by working in a primal space, without computing large Gram matrices.

### Fast Randomized Kernel Ridge Regression with Statistical Guarantees

- Computer ScienceNIPS
- 2015

A version of this approach that comes with running time guarantees as well as improved guarantees on its statistical performance is described, and a fast algorithm is presented to quickly compute coarse approximations to these scores in time linear in the number of samples.

### Fast Distribution To Real Regression

- Computer Science, MathematicsAISTATS
- 2014

The Double-Basis estimator is proposed, which looks to alleviate the problem of distribution to real-value regression where a large amount of data may be necessary for a low estimation risk, but the computation cost of estimation becomes infeasible when the data-set is too large.

### Probability Product Kernels

- Computer ScienceJ. Mach. Learn. Res.
- 2004

The advantages of discriminative learning algorithms and kernel machines are combined with generative modeling using a novel kernel between distributions to exploit the properties, metrics and invariances of the generative models the authors infer from each datum.

### Dimensionality Reduction for Supervised Learning with Reproducing Kernel Hilbert Spaces

- Computer ScienceJ. Mach. Learn. Res.
- 2004

A novel method of dimensionality reduction for supervised learning problems that requires neither assumptions on the marginal distribution of X, nor a parametric model of the conditional distribution of Y, and establishes a general nonparametric characterization of conditional independence using covariance operators on reproducing kernel Hilbert spaces.

### Learning from Distributions via Support Measure Machines

- Computer ScienceNIPS
- 2012

A kernel-based discriminative learning framework on probability measures that learns using a collection of probability distributions that have been constructed to meaningfully represent training data and proposes a flexible SVM (Flex-SVM) that places different kernel functions on each training example.

### Divide and conquer kernel ridge regression: a distributed algorithm with minimax optimal rates

- Computer Science, MathematicsJ. Mach. Learn. Res.
- 2015

It is established that despite the computational speed-up, statistical optimality is retained: as long as m is not too large, the partition-based estimator achieves the statistical minimax rate over all estimators using the set of N samples.

### Nonparametric Divergence Estimation with Applications to Machine Learning on Distributions

- Computer ScienceUAI
- 2011

Estimation algorithms are presented, how to apply them for machine learning tasks on distributions are described, and empirical results on synthetic data, real word images, and astronomical data sets are shown.

### A Generalized Kernel Approach to Structured Output Learning

- Computer ScienceICML
- 2013

This work proposes a covariance-based operator-valued kernel that allows for the decoupling between outputs in the image space and the inability to use a joint feature space, and introduces a variant of the KDE method based on the conditional covariance operator that in addition to the correlation between the outputs takes into account the effects of the input variables.

### Randomized sketches for kernels: Fast and optimal non-parametric regression

- Computer Science, MathematicsArXiv
- 2015

It is proved that it suffices to choose the sketch dimension $m$ proportional to the statistical dimension (modulo logarithmic factors) of the kernel matrix, and fast and minimax optimal approximations to the KRR estimate for non-parametric regression are obtained.