• Corpus ID: 7565247

Learning Theory for Distribution Regression

  title={Learning Theory for Distribution Regression},
  author={Zolt{\'a}n Szab{\'o} and Bharath K. Sriperumbudur and Barnab{\'a}s P{\'o}czos and Arthur Gretton},
  journal={J. Mach. Learn. Res.},
We focus on the distribution regression problem: regressing to vector-valued outputs from probability measures. Many important machine learning and statistical tasks fit into this framework, including multi-instance learning and point estimation problems without analytical solution (such as hyperparameter or entropy estimation). Despite the large number of available heuristics in the literature, the inherent two-stage sampled nature of the problem makes the theoretical analysis quite… 

Figures and Tables from this paper

Distribution Regression with Sliced Wasserstein Kernels

The theoretical properties of a kernel ridge regression estimator based on an OT-based representation are studied, for which it is proved universal consistency and excess risk bounds are proved.

Robust kernel-based distribution regression

By introducing a robust loss function lσ for two-stage sampling problems, this paper presents a novel robust distribution regression (RDR) scheme with a windowing function V and a scaling parameter σ that is shown to be crucial in providing robustness and satisfactory learning rates of RDR.

Coefficient-based Regularized Distribution Regression

The algorithm under consideration provides a simple paradigm for designing indefinite kernel methods, which enriches the theme of the distribution regression, and gets the optimal rates under some mild conditions, which matches the one-stage sampled minimax optimal rate.

Estimates on Learning Rates for Multi-Penalty Distribution Regression

A novel multi-penalty regularization algorithm to capture more features of distribution regression and derive optimal learning rates for the algorithm is presented and a distributed learning algorithm to face large-scale data or information challenge is proposed.

Wasserstein Regression*

The analysis of samples of random objects that do not lie in a vector space has found increasing attention in statistics in recent years. An important class of such object data is univariate

Optimal learning rates for distribution regression

Proposal : Scalable , Active and Flexible Learning on Distributions

This thesis investigates the approach of approximate embeddings into Euclidean spaces such that inner products in the embedding space approximate kernel values between the source distributions and proposes to adapt recent kernel learning techniques to the distributional setting, allowing the automatic selection of good kernels for the task at hand.

Wasserstein Regression these are either Nadaraya – Watson type estimators that suffer from a severe curse of dimensionality

The analysis of samples of random objects that do not lie in a vector space is gaining increasing attention in statistics. An important class of such object data is univariate probability measures

Optimal Rates of Distributed Regression with Imperfect Kernels

This paper establishes a general framework that allows to analyze distributed regression with response weighted base algorithms by bounding the error of such algorithms on a single data set, provided that the error bounds has factored the impact of the unexplained variance of the response variable.

Bayesian Distribution Regression

This work constructs a Bayesian distribution regression formalism that accounts for uncertainty in observations due to sampling variability in groups, improving the robustness and performance of the model when group sizes vary.



Linear-Time Learning on Distributions with Approximate Kernel Embeddings

This work develops the first random features for pdfs whose dot product approximates kernels using these non-Euclidean metrics, allowing estimators to scale to large datasets by working in a primal space, without computing large Gram matrices.

Fast Randomized Kernel Ridge Regression with Statistical Guarantees

A version of this approach that comes with running time guarantees as well as improved guarantees on its statistical performance is described, and a fast algorithm is presented to quickly compute coarse approximations to these scores in time linear in the number of samples.

Fast Distribution To Real Regression

The Double-Basis estimator is proposed, which looks to alleviate the problem of distribution to real-value regression where a large amount of data may be necessary for a low estimation risk, but the computation cost of estimation becomes infeasible when the data-set is too large.

Probability Product Kernels

The advantages of discriminative learning algorithms and kernel machines are combined with generative modeling using a novel kernel between distributions to exploit the properties, metrics and invariances of the generative models the authors infer from each datum.

Dimensionality Reduction for Supervised Learning with Reproducing Kernel Hilbert Spaces

A novel method of dimensionality reduction for supervised learning problems that requires neither assumptions on the marginal distribution of X, nor a parametric model of the conditional distribution of Y, and establishes a general nonparametric characterization of conditional independence using covariance operators on reproducing kernel Hilbert spaces.

Learning from Distributions via Support Measure Machines

A kernel-based discriminative learning framework on probability measures that learns using a collection of probability distributions that have been constructed to meaningfully represent training data and proposes a flexible SVM (Flex-SVM) that places different kernel functions on each training example.

Divide and conquer kernel ridge regression: a distributed algorithm with minimax optimal rates

It is established that despite the computational speed-up, statistical optimality is retained: as long as m is not too large, the partition-based estimator achieves the statistical minimax rate over all estimators using the set of N samples.

Nonparametric Divergence Estimation with Applications to Machine Learning on Distributions

Estimation algorithms are presented, how to apply them for machine learning tasks on distributions are described, and empirical results on synthetic data, real word images, and astronomical data sets are shown.

A Generalized Kernel Approach to Structured Output Learning

This work proposes a covariance-based operator-valued kernel that allows for the decoupling between outputs in the image space and the inability to use a joint feature space, and introduces a variant of the KDE method based on the conditional covariance operator that in addition to the correlation between the outputs takes into account the effects of the input variables.

Randomized sketches for kernels: Fast and optimal non-parametric regression

It is proved that it suffices to choose the sketch dimension $m$ proportional to the statistical dimension (modulo logarithmic factors) of the kernel matrix, and fast and minimax optimal approximations to the KRR estimate for non-parametric regression are obtained.