# People mover's distance: Class level geometry using fast pairwise data adaptive transportation costs

@article{Cloninger2019PeopleMD, title={People mover's distance: Class level geometry using fast pairwise data adaptive transportation costs}, author={Alexander Cloninger and Brita Roy and Carley Riley and Harlan M. Krumholz}, journal={Applied and Computational Harmonic Analysis}, year={2019} }

We address the problem of defining a network graph on a large collection of classes. Each class is comprised of a collection of data points, sampled in a non i.i.d. way, from some unknown underlying distribution. The application we consider in this paper is a large scale high dimensional survey of people living in the US, and the question of how similar or different are the various counties in which these people live. We use a co-clustering diffusion metric to learn the underlying distribution… Expand

#### 2 Citations

Linear Optimal Transport Embedding: Provable fast Wasserstein distance computation and classification for nonlinear problems

- Mathematics, Computer Science
- ArXiv
- 2020

This paper characterize a number of settings in which LOT embeds families of distributions into a space in which they are linearly separable, and proves conditions under which the distance of the LOT embedding between two distributions in arbitrary dimension is nearly isometric to Wasserstein-2 distance between those distributions. Expand

A low discrepancy sequence on graphs

- Computer Science, Mathematics
- Journal of Fourier Analysis and Applications
- 2021

This work describes a construction of a sampling scheme analogous to the so called Leja points in complex potential theory that can be proved to give low discrepancy estimates for the approximation of the expected value by the impirical expected value based on these points. Expand

#### References

SHOWING 1-10 OF 22 REFERENCES

On the Surprising Behavior of Distance Metrics in High Dimensional Spaces

- Computer Science
- ICDT
- 2001

This paper examines the behavior of the commonly used L k norm and shows that the problem of meaningfulness in high dimensionality is sensitive to the value of k, which means that the Manhattan distance metric is consistently more preferable than the Euclidean distance metric for high dimensional data mining applications. Expand

The Earth Mover's Distance as a Metric for Image Retrieval

- Mathematics, Computer Science
- International Journal of Computer Vision
- 2004

This paper investigates the properties of a metric between two distributions, the Earth Mover's Distance (EMD), for content-based image retrieval, and compares the retrieval performance of the EMD with that of other distances. Expand

Earth Mover ’ s Distance and Equivalent Metrics for Spaces with Hierarchical Partition trees

- 2013

partition tree, and prove their equivalence. Similar metrics have previously been defined in more restrictive settings; in particular, the well-known Earth Mover’s Distance is widely used in machine… Expand

Approximate earth mover’s distance in linear time

- Mathematics, Computer Science
- 2008 IEEE Conference on Computer Vision and Pattern Recognition
- 2008

It is experimentally show that wavelet EMD is a good approximation to EMD, has similar performance, but requires much less computation, while the comparison is about as fast as for normal Euclidean distance or chi2 statistic. Expand

Diffusion maps

- 2006

In this paper, we provide a framework based upon diffusion processes for finding meaningful geometric descriptions of data sets. We show that eigenfunctions of Markov matrices can be used to… Expand

Understanding bag-of-words model: a statistical framework

- Computer Science
- Int. J. Mach. Learn. Cybern.
- 2010

A statistical framework which generalizes the bag-of-words representation, in which the visual words are generated by a statistical process rather than using a clustering algorithm, while the empirical performance is competitive to clustering-based method. Expand

Pattern Classification

- Springer London
- 2001

Classification • Supervised – parallelpiped – minimum distance – maximum likelihood (Bayes Rule) > non-parametric > parametric – support vector machines – neural networks – context classification •… Expand

Hölder–Lipschitz Norms and Their Duals on Spaces with Semigroups, with Applications to Earth Mover’s Distance

- Mathematics
- 2016

We introduce a family of bounded, multiscale distances on any space equipped with an operator semigroup. In many examples, these distances are equivalent to a snowflake of the natural distance on the… Expand

Sampling, denoising and compression of matrices by coherent matrix organization

- Mathematics
- 2012

Abstract The need to organize and analyze real-valued matrices arises in various settings – notably, in data analysis (where matrices are multivariate data sets) and in numerical analysis (where… Expand

Topic modeling: beyond bag-of-words

- Computer Science
- ICML
- 2006

A hierarchical generative probabilistic model that incorporates both n-gram statistics and latent topic variables by extending a unigram topic model to include properties of a hierarchical Dirichlet bigram language model is explored. Expand