Distribution-Free Multisample Tests Based on Optimal Matchings With Applications to Single Cell Genomics

  title={Distribution-Free Multisample Tests Based on Optimal Matchings With Applications to Single Cell Genomics},
  author={Somabha Mukherjee and Divyansh Agarwal and Nancy Ruonan Zhang and Bhaswar B. Bhattacharya},
  journal={arXiv: Methodology},
In this paper we propose a nonparametric graphical test based on optimal matching, for assessing the equality of multiple unknown multivariate probability distributions. Our procedure pools the data from the different classes to create a graph based on the minimum non-bipartite matching, and then utilizes the number of edges connecting data points from different classes to examine the closeness between the distributions. The proposed test is exactly distribution-free (the null distribution does… 
Multivariate Rank-Based Distribution-Free Nonparametric Testing Using Measure Transportation
In this paper, we propose a general framework for distribution-free nonparametric testing in multi-dimensions, based on a notion of multivariate ranks defined using the theory of measure
Feature Selection in High-dimensional Space Using Graph-Based Methods
High-dimensional feature selection is a central problem in a variety of application domains such as machine learning, image analysis, and genomics. In this paper, we propose graph-based tests as a
Measuring Association on Topological Spaces Using Kernels and Geometric Graphs
In this paper we propose and study a class of simple, nonparametric, yet interpretable measures of association between two random variables $X$ and $Y$ taking values in general topological spaces.
A Bayesian nonparametric multi-sample test in any dimension
This paper considers a general Bayesian test for the multi-sample problem. Specifically, for M independent samples, the interest is to determine whether the M samples are generated from the same
Islet Transplantation in the Subcutaneous Space Achieves Long-term Euglycemia in Preclinical Models of Type 1 Diabetes
The successful subcutaneous transplantation of pancreatic islets admixed with a device-free Islet Viability Matrix (IVM) resulting in long-term euglycemia in diverse immune-competent and immuno-inCompetent animal models is reported.
Efficiency Lower Bounds for Distribution-Free Hotelling-Type Two-Sample Tests Based on Optimal Transport
The Wilcoxon rank-sum test is one of the most popular distribution-free procedures for testing the equality of two univariate probability distributions. One of the main reasons for its popularity can


Graph-theoretic multisample tests of equality in distribution for high dimensional data
  • A. Petrie
  • Mathematics, Computer Science
    Comput. Stat. Data Anal.
  • 2016
A suite of Monte Carlo simulations shows that orthogonal perfect matchings and spanning trees typically have higher power than other graphs and are also more effective at discerning when samples have differences in their covariance structure compared to other nonparametric tests such as the energy and triangle tests.
An exact distribution‐free test comparing two multivariate distributions based on adjacency
Summary. A new test is proposed comparing two multivariate distributions by using distances between observations. Unlike earlier tests using interpoint distances, the new test statistic has a known
Two-Sample Tests Based on Geometric Graphs: Asymptotic Distribution and Detection Thresholds
In this paper we consider the problem of testing the equality of two multivariate distributions based on geometric graphs, constructed using the inter-point distances between the observations. These
A New Graph-Based Two-Sample Test for Multivariate and Object Data
A novel test statistic based on a similarity graph constructed on the pooled observations from the two samples is presented, which can be applied to multivariate data and non-Euclidean data as long as a dissimilarity measure on the sample space can be defined.
A Weighted Edge-Count Two-Sample Test for Multivariate and Object Data
  • Hao Chen, Xu Chen, Yi Su
  • Computer Science, Mathematics
    Journal of the American Statistical Association
  • 2018
A nonparametric testing procedure that uses graphs representing the similarity among observations that can be applied to any data types as long as an informative similarity measure on the sample space can be defined is studied.
Sensitivity Analysis for the Cross-Match Test, With Applications in Genomics
The cross-match test is an exact, distribution-free test of no treatment effect on a high-dimensional outcome in a randomized experiment. The test uses optimal nonbipartite matching to pair 2I
Multivariate Two-Sample Tests Based on Nearest Neighbors
Abstract A new class of simple tests is proposed for the general multivariate two-sample problem based on the (possibly weighted) proportion of all k nearest neighbor comparisons in which
Multivariate Ranks and Quantiles using Optimal Transportation and Applications to Goodness-of-fit Testing
In this paper we study multivariate ranks and quantiles, defined using the theory of optimal transportation, and build on the work of Chernozhukov et al. (2017) and del Barrio et al. (2018). We study
A distribution-free two-sample run test applicable to high-dimensional data
We propose a multivariate generalization of the univariate two-sample run test based on the shortest Hamiltonian path. The proposed test is distribution-free in finite samples. While most existing
Testing the equality of distributions of random vectors with categorical components
We develop a method for testing the equality of two or more distributions of random vectors with categorical components. We define a function that gives a distance between any two data vectors. Each