Corpus ID: 225061997

Distribution Regression for Sequential Data

  title={Distribution Regression for Sequential Data},
  author={Maud Lemercier and Cristopher Salvi and Theodoros Damoulas and Edwin V. Bonilla and Terry Lyons},
Distribution regression refers to the supervised learning problem where labels are only available for groups of inputs instead of individual inputs. In this paper, we develop a rigorous mathematical framework for distribution regression where inputs are complex data streams. Leveraging properties of the expected signature and a recent signature kernel trick for sequential data from stochastic analysis, we introduce two new learning techniques, one feature-based and the other kernel-based. Each… Expand
Higher Order Kernel Mean Embeddings to Capture Filtrations of Stochastic Processes
A family of higher order kernel mean embeddings (KMEs) are introduced that generalizes the notion of KME and captures additional information related to the filtration of stochastic processes. Expand
General Signature Kernels
Suppose that γ and σ are two continuous bounded variation paths which take values in a finite-dimensional inner product space V . The recent papers [18] and [6] respectively introduced the truncatedExpand
Multi-resolution Spatial Regression for Aggregated Data with an Application to Crop Yield Prediction
We develop a new methodology for spatial regression of aggregated outputs on multi-resolution covariates. Such problems often occur with spatial data, for example in crop yield prediction, where theExpand
Path Signature Area-Based Causal Discovery in Coupled Time Series
  • Will Glad, Thomas Woolf
  • Mathematics, Computer Science
  • 2021
Coupled dynamical systems are frequently observed in nature, but often not well understood in terms of their causal structure without additional domain knowledge about the system. Especially whenExpand
Signature asymptotics, empirical processes, and optimal transport
Rough path theory [1] provides one with the notion of signature, a graded family of tensors which characterise, up to a negligible equivalence class, and ordered stream of vector-valued data. In theExpand


Learning Theory for Distribution Regression
This paper studies a simple, analytically computable, ridge regression-based alternative to distribution regression, where the distributions are embedded to a reproducing kernel Hilbert space, and the regressor is learned from the embeddings to the outputs, establishing the consistency of the classical set kernel. Expand
Supervised Learning by Training on Aggregate Outputs
This work presents a new twist on supervised learning where, instead of having the training set contain an individual output value for each input vector, the output values in theTraining set are only given in aggregate over a number of input vectors. Expand
Bayesian Distribution Regression
This work constructs a Bayesian distribution regression formalism that accounts for uncertainty in observations due to sampling variability in groups, improving the robustness and performance of the model when group sizes vary. Expand
Bayesian Approaches to Distribution Regression
This work frames their models in a neural network style, allowing for simple MAP inference using backpropagation to learn the parameters, as well as MCMC-based inference which can fully propagate uncertainty. Expand
Variational Learning on Aggregate Outputs with Gaussian Processes
This work develops a framework which extends to several types of likelihoods, including the Poisson model for aggregated count data, and applies it to the fine-scale spatial modelling of malaria incidence, with over 1 million observations. Expand
Learning from Distributions via Support Measure Machines
A kernel-based discriminative learning framework on probability measures that learns using a collection of probability distributions that have been constructed to meaningfully represent training data and proposes a flexible SVM (Flex-SVM) that places different kernel functions on each training example. Expand
Kernels for sequentially ordered data
The experiments indicate that the signature-based sequential kernel framework may be a promising approach to learning with sequential data, such as time series, that allows to avoid extensive manual pre-processing. Expand
Bayesian Semi-supervised Learning with Graph Gaussian Processes
We propose a data-efficient Gaussian process-based Bayesian approach to the semi-supervised learning problem on graphs. The proposed model shows extremely competitive performance when compared to theExpand
Deep Sets
The main theorem characterizes the permutation invariant objective functions and provides a family of functions to which any permutation covariant objective function must belong, which enables the design of a deep network architecture that can operate on sets and which can be deployed on a variety of scenarios including both unsupervised and supervised learning tasks. Expand
Multiple-Instance Regression with Structured Data
A multiple-instance regression algorithm that models internal bag structure to identify the items most relevant to the bag labels and provided predictions that were more accurate than those obtained with non-multiple-instance approaches or MIR methods that do not model the bag structure. Expand