# Projected Statistical Methods for Distributional Data on the Real Line with the Wasserstein Metric

@article{Pegoraro2021ProjectedSM, title={Projected Statistical Methods for Distributional Data on the Real Line with the Wasserstein Metric}, author={Matteo Pegoraro and Mario Beraha}, journal={J. Mach. Learn. Res.}, year={2021}, volume={23}, pages={37:1-37:59} }

We present a novel class of projected methods, to perform statistical analysis on a data set of probability distributions on the real line, with the 2-Wasserstein metric. We focus in particular on Principal Component Analysis (PCA) and regression. To define these models, we exploit a representation of the Wasserstein space closely related to its weak Riemannian structure, by mapping the data to a suitable linear space and using a metric projection operator to constrain the results in the…

## 5 Citations

### Fast PCA in 1-D Wasserstein Spaces via B-splines Representation and Metric Projection

- Computer ScienceAAAI
- 2021

A novel definition of Principal Component Analysis in the Wasserstein space is proposed that yields a straightforward optimization problem that is extremely fast to compute and performs similarly to the ones already proposed in the literature while retaining a much smaller computational cost.

### Efficient Convex PCA with applications to Wasserstein geodesic PCA and ranked data

- Mathematics, Computer Science
- 2022

A numerical implementation of convex PCA when the convex set is polyhedral, and it is shown that this provides a natural approximation of Wasserstein geodesic PCA.

### Spherical Autoregressive Models, With Application to Distributional and Compositional Time Series

- Mathematics
- 2022

We introduce a new class of autoregressive models for spherical time series, where the dimension of the spheres on which the observations of the time series are situated may be ﬁnite-dimensional or…

### Normalized Latent Measure Factor Models

- Computer Science
- 2022

This work considers a prior distribution for a collection of discrete random measures where each measure is a linear combination of a set of latent measures, interpretable as characteristic traits shared by diﬀerent distributions, with positive random weights.

### Autoregressive Optimal Transport Models

- Mathematics
- 2021

Series of distributions indexed by equally spaced time points are ubiquitous in applications and their analysis constitutes one of the challenges of the emerging field of distributional data…

## References

SHOWING 1-10 OF 69 REFERENCES

### Geodesic PCA in the Wasserstein space by Convex PCA

- Mathematics
- 2017

We introduce the method of Geodesic Principal Component Analysis (GPCA) on the space of probability measures on the line, with finite second moment, endowed with the Wasserstein metric. We discuss…

### Wasserstein Regression*

- MathematicsJournal of the American Statistical Association
- 2021

The analysis of samples of random objects that do not lie in a vector space has found increasing attention in statistics in recent years. An important class of such object data is univariate…

### MEASURING DEPENDENCE IN THE WASSERSTEIN DISTANCE FOR BAYESIAN NONPARAMETRIC MODELS

- Computer Science
- 2020

A framework to quantify dependence of a random vector of probabilities in terms of closeness to exchangeability, which corresponds to the maximally dependent coupling with the same marginal distributions, i.e. the comonotonic vector is devised.

### On parameter estimation with the Wasserstein distance

- Mathematics, Computer ScienceInformation and Inference: A Journal of the IMA
- 2019

These results cover the misspecified setting, in which the data-generating process is not assumed to be part of the family of distributions described by the model, and some difficulties arising in the numerical approximation of these estimators are discussed.

### Approximate Bayesian computation with the Wasserstein distance

- Computer ScienceJournal of the Royal Statistical Society: Series B (Statistical Methodology)
- 2019

This work proposes to avoid the use of summaries and the ensuing loss of information by instead using the Wasserstein distance between the empirical distributions of the observed and synthetic data, and generalizes the well‐known approach of using order statistics within approximate Bayesian computation to arbitrary dimensions.

### Principal geodesic analysis for the study of nonlinear statistics of shape

- MathematicsIEEE Transactions on Medical Imaging
- 2004

The method of principal geodesic analysis is developed, a generalization of principal component analysis to the manifold setting and demonstrated its use in describing the variability of medially-defined anatomical objects.

### Geodesic Regression and the Theory of Least Squares on Riemannian Manifolds

- MathematicsInternational Journal of Computer Vision
- 2012

Specific examples are given for a set of synthetically generated rotation data and an application to analyzing shape changes in the corpus callosum due to age, which can be generally applied to data on any manifold.

### Wasserstein autoregressive models for density time series

- Computer Science, MathematicsJournal of Time Series Analysis
- 2021

The proposed autoregressive model outperforms existing methods in two of the data sets, while the best empirical performance in the other two data sets is attained by existing methods based on functional transformations of the densities.

### Simplicial principal component analysis for density functions in Bayes spaces

- MathematicsComput. Stat. Data Anal.
- 2016

### Intrinsic Statistics on Riemannian Manifolds: Basic Tools for Geometric Measurements

- MathematicsJournal of Mathematical Imaging and Vision
- 2006

This paper provides a new proof of the characterization of Riemannian centers of mass and an original gradient descent algorithm to efficiently compute them and develops the notions of mean value and covariance matrix of a random element, normal law, Mahalanobis distance and χ2 law.