• Corpus ID: 239009914

A Distribution-Free Independence Test for High Dimension Data

  title={A Distribution-Free Independence Test for High Dimension Data},
  author={Zhanrui Cai and Jing Lei and Kathryn Roeder},
Test of independence is of fundamental importance in modern data analysis, with broad applications in variable selection, graphical models, and causal inference. When the data is high dimensional and the potential dependence signal is sparse, independence testing becomes very challenging without distributional or structural assumptions. In this paper we propose a general framework for independence testing by first fitting a classifier that distinguishes the joint and product distributions, and… 

Figures and Tables from this paper


Testing mutual independence in high dimension via distance covariance
We introduce an L2†type test for testing mutual independence and banded dependence structure for high dimensional data. The test is constructed on the basis of the pairwise distance covariance and
Global and local two-sample tests via regression
Two-sample testing is a fundamental problem in statistics. Despite its long history, there has been renewed interest in this problem with the advent of high-dimensional and complex data.
The distance correlation t-test of independence in high dimension
A modified distance correlation statistic is proposed, such that under independence the distribution of a transformation of the statistic converges to Student t, as dimension tends to infinity, and the resulting t-test is unbiased for every sample size greater than three and all significance levels.
On some exact distribution-free tests of independence between two random vectors of arbitrary dimensions
Abstract Several nonparametric methods are available in the literature to test the independence between two random vectors. But, many of them perform poorly for high dimensional data and are not
Nonparametric independence testing via mutual information
This work proposes a test of independence of two multivariate random vectors, given a sample from the underlying population, based on the estimation of mutual information, whose decomposition into joint and marginal entropies facilitates the use of recently-developed efficient entropy estimators derived from nearest neighbour distances.
A Kernel Statistical Test of Independence
A novel test of the independence hypothesis for one particular kernel independence measure, the Hilbert-Schmidt independence criterion (HSIC), which outperforms established contingency table and functional correlation-based tests, and is greater for multivariate data.
Universal inference
A surprisingly simple method for producing statistical significance statements without any regularity conditions and it is shown that in settings when computing the MLE is hard, for the purpose of constructing valid tests and intervals, it is sufficient to upper bound the maximum likelihood.
Classification Accuracy as a Proxy for Two Sample Testing
This work proves two results that hold for all classifiers in any dimensions: if its true error remains $\epsilon-better than chance for some $\epSilon>0$ as $d,n \to \infty$, then (a) the permutation-based test is consistent (has power approaching to one), and (b) a computationally efficient test based on a Gaussian approximation of the null distribution is also consistent.
Multivariate Rank-Based Distribution-Free Nonparametric Testing Using Measure Transportation
In this paper, we propose a general framework for distribution-free nonparametric testing in multi-dimensions, based on a notion of multivariate ranks defined using the theory of measure
A Distribution-Free Test of Covariate Shift Using Conformal Prediction
This is the first successful attempt of using conformal prediction for testing statistical hypotheses and can be effectively combined with existing classification algorithms to find good conformity score functions.