• Corpus ID: 10581020

Theoretical Foundations of Equitability and the Maximal Information Coefficient

@article{Reshef2014TheoreticalFO,
  title={Theoretical Foundations of Equitability and the Maximal Information Coefficient},
  author={Yakir A Reshef and David N. Reshef and Pardis C Sabeti and Michael Mitzenmacher},
  journal={ArXiv},
  year={2014},
  volume={abs/1408.4908}
}
The maximal information coecient (MIC) is a tool for nding the strongest pairwise relationships in a data set with many variables [1]. MIC is useful because it gives similar scores to equally noisy relationships of dierent types. This property, called equitability, is important for analyzing high-dimensional data sets. Here we formalize the theory behind both equitability and MIC in the language of estimation theory. This formalization has a number of advantages. First, it allows us to show… 

Figures and Tables from this paper

A Robust-Equitable Measure for Feature Ranking and Selection

A new concept of robust-Equitability is introduced and a robust-equitable copula dependence measure is identified, the robustCopula dependence (RCD) measure, which is based on the L1-distance of the copula density from uniform and it is proved theoretically that RCD is much easier to estimate than mutual information.

A Novel Algorithm for the Precise Calculation of the Maximal Information Coefficient

The results show that when run on fruit fly data set including 1,000,000 pairs of gene expression profiles, the mean squared difference between SG and the exhaustive algorithm is 0.00075499, compared with 0.1834 in the case of ApproxMaxMI.

What is equitability ?

  • Medicine
  • 2014
This document serves to provide some basic background and understanding of MIC as well as to address some of the questions raised about MIC in the literature, and to provide pointers to relevant supporting work.

Dependence in macroeconomic variables: Assessing instantaneous and persistent relations between and within time series

The present thesis comprises two rather independent chapters. In general, the diagnosis and quantification of dependence is a major aim of econometric studies. Along these lines, the concept of

Clustermatch: discovering hidden relations in highly diverse kinds of qualitative and quantitative data without standardization

A new method named Clustermatch is designed to easily and efficiently perform data-mining tasks on large and highly heterogeneous datasets and is better suited for finding meaningful relationships in complex datasets.

MIC for Analyzing Attributes Associated with Thai Agricultural Products

This paper will present the theoretical of MIC, one statistical method to measure a correlation coefficient of pairwise variables on an immense dataset, and related works.

A Survey of Big Data Analytics Using Machine Learning Algorithms

  • Usha MoorthyU. Gandhi
  • Computer Science
    Research Anthology on Big Data Analytics, Architectures, and Applications
  • 2022
Security issues in big data are reviewed and the performance of ML and DL in a critical environment is evaluated and issues and challenges of ML are focused on and their remedies are investigated.

Detecting Associations Based on the Multi-Variable Maximum Information Coefficient

An algorithm based on greedy stepwise strategy and upper confidence bound (UCB) for an approximate calculation of MMIC, a novel and widely-using measure of association detection in large datasets that has both generality and equability.

References

SHOWING 1-10 OF 32 REFERENCES

Equitability, mutual information, and the maximal information coefficient

  • J. KinneyG. Atwal
  • Computer Science
    Proceedings of the National Academy of Sciences
  • 2014
It is argued that equitability is properly formalized by a self-consistency condition closely related to Data Processing Inequality, and shown that estimating mutual information provides a natural and practical method for equitably quantifying associations in large datasets.

Measuring Dependence Powerfully and Equitably

This paper introduces and characterize a population measure of dependence called MIC*, and introduces an efficient approach for computing MIC* from the density of a pair of random variables, and defines a new consistent estimator MICe for MIC* that is efficiently computable.

Cleaning up the record on the maximal information coefficient and equitability

It is argued that regardless of whether “perfect” equitability is possible, approximate notions of equitability remain the right goal for many data exploration settings, and that mutual information is more equitable than MIC under a range of noise models.

Equitability, interval estimation, and statistical power

This work formally present and characterize equitability, a property of measures of dependence that enables fruitful analysis of data sets with a small number of strong, interesting relationships and a large number of weaker ones, and draws on the equivalence of interval estimation and hypothesis testing to draw on this property.

R2-equitability is satisfiable

It is argued that no nontrivial dependence measure can satisfy R 2 R2-equitability, and that this is the result of a poorly constructed definition.

Resolution dependence of the maximal information coefficient for noiseless relationship

An iterative greedy algorithm is provided, as an alternative to the ApproxMaxMI proposed by Reshef et al., to determine the value of MIC through iterative optimization, which can be conducted parallelly.

Estimating mutual information.

Two classes of improved estimators for mutual information M(X,Y), from samples of random points distributed according to some joint probability density mu(x,y), based on entropy estimates from k -nearest neighbor distances are presented.

Measuring Statistical Dependence with Hilbert-Schmidt Norms

We propose an independence criterion based on the eigen-spectrum of covariance operators in reproducing kernel Hilbert spaces (RKHSs), consisting of an empirical estimate of the Hilbert-Schmidt norm

On the Maximum Correlation Coefficient

For an arbitrary random vector $(X,Y)$ and an independent random variable Z it is shown that the maximum correlation coefficient between X and $Y+\lambda Z$ as a function of $\lambda$ is lower

Detecting Novel Associations in Large Data Sets

A measure of dependence for two-variable relationships: the maximal information coefficient (MIC), which captures a wide range of associations both functional and not, and for functional relationships provides a score that roughly equals the coefficient of determination of the data relative to the regression function.