• Corpus ID: 83390

Measuring Dependence Powerfully and Equitably

@article{Reshef2016MeasuringDP,
  title={Measuring Dependence Powerfully and Equitably},
  author={Yakir A Reshef and David N. Reshef and Hilary K. Finucane and Pardis C Sabeti and Michael Mitzenmacher},
  journal={ArXiv},
  year={2016},
  volume={abs/1505.02213}
}
Given a high-dimensional data set we often wish to find the strongest relationships within it. A common strategy is to evaluate a measure of dependence on every variable pair and retain the highest-scoring pairs for follow-up. This strategy works well if the statistic used is equitable [Reshef et al. 2015a], i.e., if, for some measure of noise, it assigns similar scores to equally noisy relationships regardless of relationship type (e.g., linear, exponential, periodic). In this paper, we… 

Figures from this paper

Equitability, interval estimation, and statistical power

TLDR
This work formally present and characterize equitability, a property of measures of dependence that enables fruitful analysis of data sets with a small number of strong, interesting relationships and a large number of weaker ones, and draws on the equivalence of interval estimation and hypothesis testing to draw on this property.

An empirical study of the maximal and total information coefficients and leading measures of dependence

TLDR
An empirical evaluation of the equitability, power against independence, and runtime of several leading measures of dependence, including the two recently introduced and simultaneously computable statistics MIC e and TIC e, whose goal is equitability.

An Empirical Study of Leading Measures of Dependence

TLDR
An extensive empirical evaluation of the equitability, power against independence, and runtime of several leading measures of dependence finds that MICe is the most equitable method on functional relationships in most of the settings the authors considered, although mutual information estimation proves themost equitable at large sample sizes in some specific settings.

Theoretical Foundations of Equitability and the Maximal Information Coefficient

TLDR
This paper formalizes the theory behind both equitability and MIC in the language of estimation theory and proves an alternate, equivalent characterization of MIC that is used to state new estimators of it as well as an algorithm for explicitly computing it when the joint probability density function of a pair of random variables is known.

A Robust-Equitable Measure for Feature Ranking and Selection

TLDR
A new concept of robust-Equitability is introduced and a robust-equitable copula dependence measure is identified, the robustCopula dependence (RCD) measure, which is based on the L1-distance of the copula density from uniform and it is proved theoretically that RCD is much easier to estimate than mutual information.

Symmetric rank covariances: a generalized framework for nonparametric measures of dependence

TLDR
Symmetric Rank Covariances is a new class of multivariate nonparametric measures of dependence that generalises all of the above measures and leads naturally to multivariate extensions of the Bergsma--Dassios sign covariance.

The randomized information coefficient: assessing dependencies in noisy data

TLDR
This work formally establishes the importance of achieving low variance when comparing relationships using the mutual information estimated with grids and experimentally demonstrates the effectiveness of RIC for detecting noisy dependencies and ranking dependencies for the applications of genetic network inference and feature selection for regression.

Design and adjustment of dependency measures

TLDR
This thesis formalizes a framework to adjust dependency measures in order to correct for biases and applies adjustments to existing dependency measures between variables and shows how to achieve better interpretability in quantification.

Estimating scale-invariant directed dependence of bivariate distributions

A Copula Statistic for Measuring Nonlinear Multivariate Dependence

A new index based on empirical copulas, termed the Copula Statistic (CoS), is introduced for assessing the strength of multivariate dependence and for testing statistical independence. New properties
...

References

SHOWING 1-10 OF 42 REFERENCES

Equitability, interval estimation, and statistical power

TLDR
This work formally present and characterize equitability, a property of measures of dependence that enables fruitful analysis of data sets with a small number of strong, interesting relationships and a large number of weaker ones, and draws on the equivalence of interval estimation and hypothesis testing to draw on this property.

An Empirical Study of Leading Measures of Dependence

TLDR
An extensive empirical evaluation of the equitability, power against independence, and runtime of several leading measures of dependence finds that MICe is the most equitable method on functional relationships in most of the settings the authors considered, although mutual information estimation proves themost equitable at large sample sizes in some specific settings.

Copula Correlation: An Equitable Dependence Measure and Extension of Pearson's Correlation

TLDR
It is shown that MI does not correctly reflect the proportion of deterministic signals hidden in noisy data, and the copula correlation (Ccor), based on the L1-distance of copula density, is shown to be equitable under both definitions.

Comment on “ Detecting Novel Associations in Large Data Sets ”

TLDR
A novel measure of dependence the maximal information coefficient aimed to capture a wide range of associations between pairs of variables, and a statistical test for independence based on MIC, and the pairs with the lowest p-values are studied.

Equitability, mutual information, and the maximal information coefficient

  • J. KinneyG. Atwal
  • Computer Science
    Proceedings of the National Academy of Sciences
  • 2014
TLDR
It is argued that equitability is properly formalized by a self-consistency condition closely related to Data Processing Inequality, and shown that estimating mutual information provides a natural and practical method for equitably quantifying associations in large datasets.

A Kernel Two-Sample Test

TLDR
This work proposes a framework for analyzing and comparing distributions, which is used to construct statistical tests to determine if two samples are drawn from different distributions, and presents two distribution free tests based on large deviation bounds for the maximum mean discrepancy (MMD).

Estimating mutual information.

TLDR
Two classes of improved estimators for mutual information M(X,Y), from samples of random points distributed according to some joint probability density mu(x,y), based on entropy estimates from k -nearest neighbor distances are presented.

Consistent Distribution-Free $K$-Sample and Independence Tests for Univariate Random Variables

TLDR
It is shown that the test statistics based on summation can serve as good estimators of the mutual information and are almost as powerful as the tests based on the optimal partition size, in simulations as well as on a real data example.

Cleaning up the record on the maximal information coefficient and equitability

TLDR
It is argued that regardless of whether “perfect” equitability is possible, approximate notions of equitability remain the right goal for many data exploration settings, and that mutual information is more equitable than MIC under a range of noise models.

Comment on "Detecting Novel Associations In Large Data Sets" by Reshef Et Al, Science Dec 16, 2011

The proposal of Reshef et al. (2011) is an interesting new approach for discovering non-linear dependencies among pairs of measurements in exploratory data mining. However, it has a potentially