# A Distribution-Free Independence Test for High Dimension Data

@inproceedings{Cai2021ADI, title={A Distribution-Free Independence Test for High Dimension Data}, author={Zhanrui Cai and Jing Lei and Kathryn Roeder}, year={2021} }

Test of independence is of fundamental importance in modern data analysis, with broad applications in variable selection, graphical models, and causal inference. When the data is high dimensional and the potential dependence signal is sparse, independence testing becomes very challenging without distributional or structural assumptions. In this paper we propose a general framework for independence testing by first fitting a classifier that distinguishes the joint and product distributions, and…

## References

SHOWING 1-10 OF 35 REFERENCES

Testing mutual independence in high dimension via distance covariance

- Mathematics
- 2016

We introduce an L2â€ type test for testing mutual independence and banded dependence structure for high dimensional data. The test is constructed on the basis of the pairwise distance covariance and…

Global and local two-sample tests via regression

- MathematicsElectronic Journal of Statistics
- 2019

Two-sample testing is a fundamental problem in statistics. Despite its long history, there has been renewed interest in this problem with the advent of high-dimensional and complex data.…

The distance correlation t-test of independence in high dimension

- Computer Science, MathematicsJ. Multivar. Anal.
- 2013

A modified distance correlation statistic is proposed, such that under independence the distribution of a transformation of the statistic converges to Student t, as dimension tends to infinity, and the resulting t-test is unbiased for every sample size greater than three and all significance levels.

On some exact distribution-free tests of independence between two random vectors of arbitrary dimensions

- Mathematics
- 2016

Abstract Several nonparametric methods are available in the literature to test the independence between two random vectors. But, many of them perform poorly for high dimensional data and are not…

Nonparametric independence testing via mutual information

- Mathematics, Computer ScienceBiometrika
- 2019

This work proposes a test of independence of two multivariate random vectors, given a sample from the underlying population, based on the estimation of mutual information, whose decomposition into joint and marginal entropies facilitates the use of recently-developed efficient entropy estimators derived from nearest neighbour distances.

A Kernel Statistical Test of Independence

- Computer Science, MathematicsNIPS
- 2007

A novel test of the independence hypothesis for one particular kernel independence measure, the Hilbert-Schmidt independence criterion (HSIC), which outperforms established contingency table and functional correlation-based tests, and is greater for multivariate data.

Universal inference

- Medicine, Computer ScienceProceedings of the National Academy of Sciences
- 2020

A surprisingly simple method for producing statistical significance statements without any regularity conditions and it is shown that in settings when computing the MLE is hard, for the purpose of constructing valid tests and intervals, it is sufficient to upper bound the maximum likelihood.

Classification Accuracy as a Proxy for Two Sample Testing

- Mathematics, Computer ScienceArXiv
- 2016

This work proves two results that hold for all classifiers in any dimensions: if its true error remains $\epsilon-better than chance for some $\epSilon>0$ as $d,n \to \infty$, then (a) the permutation-based test is consistent (has power approaching to one), and (b) a computationally efficient test based on a Gaussian approximation of the null distribution is also consistent.

Multivariate Rank-Based Distribution-Free Nonparametric Testing Using Measure Transportation

- Mathematics
- 2019

In this paper, we propose a general framework for distribution-free nonparametric testing in multi-dimensions, based on a notion of multivariate ranks defined using the theory of measure…

A Distribution-Free Test of Covariate Shift Using Conformal Prediction

- Computer Science, Mathematics
- 2020

This is the first successful attempt of using conformal prediction for testing statistical hypotheses and can be effectively combined with existing classification algorithms to find good conformity score functions.