A fast algorithm for computing distance correlation

  title={A fast algorithm for computing distance correlation},
  author={Arin Chaudhuri and Wenhao Hu},
  journal={Comput. Stat. Data Anal.},

Tables from this paper

The Chi-Square Test of Distance Correlation
It is proved the chi-squared test can be valid and universally consistent for testing independence, and established a testing power inequality with respect to the permutation test.
High-Dimensional Independence Testing and Maximum Marginal Correlation
It is proved that the maximum method can be valid and universally consistent for testing high-dimensional dependence under regularity conditions, and when and how themaximum method may outperform other methods are demonstrated.
An Alternate Unsupervised Technique Based on Distance Correlation and Shannon Entropy to Estimate λ0-Fuzzy Measure
This study has contributed an alternate unsupervised technique that can estimate λ0-measure values without necessitating any additional data from the decision-makers, and at the same time can better capture the interdependencies held by the attributes.
Optimal Projections in the Distance-Based Statistical Methods
This paper shows that the exact solution of the nonconvex optimization problem can be derived in two special cases: the dimension of the data is equal to either 2 or the number of projection directions, and proposes an algorithm to find approximate solutions.
Rate-Optimality of Consistent Distribution-Free Tests of Independence Based on Center-Outward Ranks and Signs
Rank correlations have found many innovative applications in the last decade. In particular,suitable versions of rank correlations have been used for consistent tests of independence between pairs of
The Exact Equivalence of Distance and Kernel Methods for Hypothesis Testing
A new bijective transformation between metrics and kernels is proposed that simplifies the fixed-point transformation, inherits similar theoretical properties, allows distance methods to be exactly the same as kernel methods for sample statistics and p-value, and better preserves the data structure upon transformation.
The Exact Equivalence of Independence Testing and Two-Sample Testing
It is shown that two-sample testing are special cases of independence testing via an auxiliary label vector, and it is proved that distance correlation is exactly equivalent to the energy statistic in terms of the population statistic, the sample statistic, and the testing p-value via permutation test.
Approximate Bayesian Computation Via the Energy Statistic
This work establishes a new asymptotic result for the case where both the observed sample size and the simulated data sample size increase to infinity, and proves that the rejection ABC algorithm, based on the energy statistic, generates pseudo-posterior distributions that achieves convergence to the correct limits when implemented with rejection thresholds that converge to zero, in the finite sample setting.
Data Reduction with Distance Correlation
  • K. George
  • Computer Science, Sociology
  • 2021
This paper examines distance correlation (DC) as a technique for determining similar data sources and defines and uses a variation of concordance for validation analysis.


A Statistically and Numerically Efficient Independence Test Based on Random Projections and Distance Covariance
  • Cheng Huang, X. Huo
  • Computer Science
    Frontiers in Applied Mathematics and Statistics
  • 2021
A test of independence method based on random projection and distance correlation, which achieves nearly the same power as the state-of-the-art distance-based approach, works in the multivariate cases, and enjoys the O(nK log  n) computational complexity and O( max{n, K}) memory requirement.
Measuring and testing dependence by correlation of distances
Distance correlation is a new measure of dependence between random vectors that is based on certain Euclidean distances between sample elements rather than sample moments, yet has a compact representation analogous to the classical covariance and correlation.
Equivalence of distance-based and RKHS-based statistics in hypothesis testing
It is shown that the energy distance most commonly employed in statistics is just one member of a parametric family of kernels, and that other choices from this family can yield more powerful tests.
Measuring Statistical Dependence with Hilbert-Schmidt Norms
We propose an independence criterion based on the eigen-spectrum of covariance operators in reproducing kernel Hilbert spaces (RKHSs), consisting of an empirical estimate of the Hilbert-Schmidt norm
Large-scale kernel methods for independence testing
This contribution provides an extensive study of the use of large-scale kernel approximations in the context of independence testing, contrasting block-based, Nyström and random Fourier feature approaches and demonstrates that the methods give comparable performance with existing methods while using significantly less computation time and memory.
Kernel-based Tests for Joint Independence
This work embeds the joint distribution and the product of the marginals in a reproducing kernel Hilbert space and defines the d‐variable Hilbert–Schmidt independence criterion dHSIC as the squared distance between the embeddings.
On Brownian Distance Covariance and High Dimensional Data.
  • M. Kosorok
  • Mathematics
    The annals of applied statistics
  • 2009
The very interesting concept of Brownian distance covariance developed by Székely and Rizzo (2009) is discussed and two possible extensions are described, including certain high throughput screening and functional data settings.
A Kernel Statistical Test of Independence
A novel test of the independence hypothesis for one particular kernel independence measure, the Hilbert-Schmidt independence criterion (HSIC), which outperforms established contingency table and functional correlation-based tests, and is greater for multivariate data.
Measuring Nonlinear Dependence in Time‐Series, a Distance Correlation Approach
This article proposes and theoretically verify a subsampling methodology for the inference of sample ADCF for dependent data and provides a useful tool for exploring nonlinear dependence structures in time‐series.
Inferring Nonlinear Gene Regulatory Networks from Gene Expression Data Based on Distance Correlation
This work proposes three DC-based GRNs inference algorithms and compares them with the mutual information (MI)-based algorithms by analyzing two simulated data: benchmark GRNs from the DREAM challenge and GRNs generated by SynTReN network generator, and an experimentally determined SOS DNA repair network in Escherichia coli.