# Distribution-Free Multisample Tests Based on Optimal Matchings With Applications to Single Cell Genomics

@article{Mukherjee2019DistributionFreeMT, title={Distribution-Free Multisample Tests Based on Optimal Matchings With Applications to Single Cell Genomics}, author={Somabha Mukherjee and Divyansh Agarwal and Nancy Ruonan Zhang and Bhaswar B. Bhattacharya}, journal={arXiv: Methodology}, year={2019} }

In this paper we propose a nonparametric graphical test based on optimal matching, for assessing the equality of multiple unknown multivariate probability distributions. Our procedure pools the data from the different classes to create a graph based on the minimum non-bipartite matching, and then utilizes the number of edges connecting data points from different classes to examine the closeness between the distributions. The proposed test is exactly distribution-free (the null distribution does…

## Figures, Tables, and Topics from this paper

## 6 Citations

Multivariate Rank-Based Distribution-Free Nonparametric Testing Using Measure Transportation

- MathematicsJournal of the American Statistical Association
- 2021

In this paper, we propose a general framework for distribution-free nonparametric testing in multi-dimensions, based on a notion of multivariate ranks defined using the theory of measure…

Feature Selection in High-dimensional Space Using Graph-Based Methods

- Mathematics
- 2021

High-dimensional feature selection is a central problem in a variety of application domains such as machine learning, image analysis, and genomics. In this paper, we propose graph-based tests as a…

Measuring Association on Topological Spaces Using Kernels and Geometric Graphs

- Mathematics
- 2020

In this paper we propose and study a class of simple, nonparametric, yet interpretable measures of association between two random variables $X$ and $Y$ taking values in general topological spaces.…

A Bayesian nonparametric multi-sample test in any dimension

- MathematicsAStA Advances in Statistical Analysis
- 2021

This paper considers a general Bayesian test for the multi-sample problem. Specifically, for M independent samples, the interest is to determine whether the M samples are generated from the same…

Islet Transplantation in the Subcutaneous Space Achieves Long-term Euglycemia in Preclinical Models of Type 1 Diabetes

- MedicineNature metabolism
- 2020

The successful subcutaneous transplantation of pancreatic islets admixed with a device-free Islet Viability Matrix (IVM) resulting in long-term euglycemia in diverse immune-competent and immuno-inCompetent animal models is reported.

Efficiency Lower Bounds for Distribution-Free Hotelling-Type Two-Sample Tests Based on Optimal Transport

- Mathematics
- 2021

The Wilcoxon rank-sum test is one of the most popular distribution-free procedures for testing the equality of two univariate probability distributions. One of the main reasons for its popularity can…

## References

SHOWING 1-10 OF 52 REFERENCES

Graph-theoretic multisample tests of equality in distribution for high dimensional data

- Mathematics, Computer ScienceComput. Stat. Data Anal.
- 2016

A suite of Monte Carlo simulations shows that orthogonal perfect matchings and spanning trees typically have higher power than other graphs and are also more effective at discerning when samples have differences in their covariance structure compared to other nonparametric tests such as the energy and triangle tests.

An exact distribution‐free test comparing two multivariate distributions based on adjacency

- Mathematics
- 2005

Summary. A new test is proposed comparing two multivariate distributions by using distances between observations. Unlike earlier tests using interpoint distances, the new test statistic has a known…

Two-Sample Tests Based on Geometric Graphs: Asymptotic Distribution and Detection Thresholds

- Mathematics
- 2015

In this paper we consider the problem of testing the equality of two multivariate distributions based on geometric graphs, constructed using the inter-point distances between the observations. These…

A New Graph-Based Two-Sample Test for Multivariate and Object Data

- Computer Science, Mathematics
- 2013

A novel test statistic based on a similarity graph constructed on the pooled observations from the two samples is presented, which can be applied to multivariate data and non-Euclidean data as long as a dissimilarity measure on the sample space can be defined.

A Weighted Edge-Count Two-Sample Test for Multivariate and Object Data

- Computer Science, MathematicsJournal of the American Statistical Association
- 2018

A nonparametric testing procedure that uses graphs representing the similarity among observations that can be applied to any data types as long as an informative similarity measure on the sample space can be defined is studied.

Sensitivity Analysis for the Cross-Match Test, With Applications in Genomics

- Mathematics
- 2010

The cross-match test is an exact, distribution-free test of no treatment effect on a high-dimensional outcome in a randomized experiment. The test uses optimal nonbipartite matching to pair 2I…

Multivariate Two-Sample Tests Based on Nearest Neighbors

- Mathematics
- 1986

Abstract A new class of simple tests is proposed for the general multivariate two-sample problem based on the (possibly weighted) proportion of all k nearest neighbor comparisons in which…

Multivariate Ranks and Quantiles using Optimal Transportation and Applications to Goodness-of-fit Testing

- Mathematics
- 2019

In this paper we study multivariate ranks and quantiles, defined using the theory of optimal transportation, and build on the work of Chernozhukov et al. (2017) and del Barrio et al. (2018). We study…

A distribution-free two-sample run test applicable to high-dimensional data

- Mathematics
- 2014

We propose a multivariate generalization of the univariate two-sample run test based on the shortest Hamiltonian path. The proposed test is distribution-free in finite samples. While most existing…

Testing the equality of distributions of random vectors with categorical components

- Mathematics
- 2001

We develop a method for testing the equality of two or more distributions of random vectors with categorical components. We define a function that gives a distance between any two data vectors. Each…