# Theoretical Foundations of Equitability and the Maximal Information Coefficient

@article{Reshef2014TheoreticalFO, title={Theoretical Foundations of Equitability and the Maximal Information Coefficient}, author={Yakir A Reshef and David N. Reshef and Pardis C Sabeti and Michael Mitzenmacher}, journal={ArXiv}, year={2014}, volume={abs/1408.4908} }

The maximal information coecient (MIC) is a tool for nding the strongest pairwise relationships in a data set with many variables [1]. MIC is useful because it gives similar scores to equally noisy relationships of dierent types. This property, called equitability, is important for analyzing high-dimensional data sets. Here we formalize the theory behind both equitability and MIC in the language of estimation theory. This formalization has a number of advantages. First, it allows us to show…

## 8 Citations

### A Robust-Equitable Measure for Feature Ranking and Selection

- Computer ScienceJ. Mach. Learn. Res.
- 2017

A new concept of robust-Equitability is introduced and a robust-equitable copula dependence measure is identified, the robustCopula dependence (RCD) measure, which is based on the L1-distance of the copula density from uniform and it is proved theoretically that RCD is much easier to estimate than mutual information.

### A Novel Algorithm for the Precise Calculation of the Maximal Information Coefficient

- Computer ScienceScientific reports
- 2014

The results show that when run on fruit fly data set including 1,000,000 pairs of gene expression profiles, the mean squared difference between SG and the exhaustive algorithm is 0.00075499, compared with 0.1834 in the case of ApproxMaxMI.

### What is equitability ?

- Medicine
- 2014

This document serves to provide some basic background and understanding of MIC as well as to address some of the questions raised about MIC in the literature, and to provide pointers to relevant supporting work.

### Dependence in macroeconomic variables: Assessing instantaneous and persistent relations between and within time series

- Economics
- 2017

The present thesis comprises two rather independent chapters. In general, the diagnosis
and quantiﬁcation of dependence is a major aim of econometric studies. Along these lines,
the concept of…

### Clustermatch: discovering hidden relations in highly diverse kinds of qualitative and quantitative data without standardization

- Computer ScienceBioinform.
- 2019

A new method named Clustermatch is designed to easily and efficiently perform data-mining tasks on large and highly heterogeneous datasets and is better suited for finding meaningful relationships in complex datasets.

### MIC for Analyzing Attributes Associated with Thai Agricultural Products

- Computer ScienceGSKI
- 2017

This paper will present the theoretical of MIC, one statistical method to measure a correlation coefficient of pairwise variables on an immense dataset, and related works.

### A Survey of Big Data Analytics Using Machine Learning Algorithms

- Computer ScienceResearch Anthology on Big Data Analytics, Architectures, and Applications
- 2022

Security issues in big data are reviewed and the performance of ML and DL in a critical environment is evaluated and issues and challenges of ML are focused on and their remedies are investigated.

### Detecting Associations Based on the Multi-Variable Maximum Information Coefficient

- Computer ScienceIEEE Access
- 2021

An algorithm based on greedy stepwise strategy and upper confidence bound (UCB) for an approximate calculation of MMIC, a novel and widely-using measure of association detection in large datasets that has both generality and equability.

## References

SHOWING 1-10 OF 32 REFERENCES

### Equitability, mutual information, and the maximal information coefficient

- Computer ScienceProceedings of the National Academy of Sciences
- 2014

It is argued that equitability is properly formalized by a self-consistency condition closely related to Data Processing Inequality, and shown that estimating mutual information provides a natural and practical method for equitably quantifying associations in large datasets.

### Measuring Dependence Powerfully and Equitably

- Computer Science, MathematicsJ. Mach. Learn. Res.
- 2016

This paper introduces and characterize a population measure of dependence called MIC*, and introduces an efficient approach for computing MIC* from the density of a pair of random variables, and defines a new consistent estimator MICe for MIC* that is efficiently computable.

### Cleaning up the record on the maximal information coefficient and equitability

- Computer ScienceProceedings of the National Academy of Sciences
- 2014

It is argued that regardless of whether “perfect” equitability is possible, approximate notions of equitability remain the right goal for many data exploration settings, and that mutual information is more equitable than MIC under a range of noise models.

### Equitability, interval estimation, and statistical power

- MathematicsStatistical Science
- 2020

This work formally present and characterize equitability, a property of measures of dependence that enables fruitful analysis of data sets with a small number of strong, interesting relationships and a large number of weaker ones, and draws on the equivalence of interval estimation and hypothesis testing to draw on this property.

### R2-equitability is satisfiable

- MathematicsProceedings of the National Academy of Sciences
- 2014

It is argued that no nontrivial dependence measure can satisfy R 2 R2-equitability, and that this is the result of a poorly constructed definition.

### Resolution dependence of the maximal information coefficient for noiseless relationship

- Computer ScienceStat. Comput.
- 2014

An iterative greedy algorithm is provided, as an alternative to the ApproxMaxMI proposed by Reshef et al., to determine the value of MIC through iterative optimization, which can be conducted parallelly.

### Estimating mutual information.

- Computer SciencePhysical review. E, Statistical, nonlinear, and soft matter physics
- 2004

Two classes of improved estimators for mutual information M(X,Y), from samples of random points distributed according to some joint probability density mu(x,y), based on entropy estimates from k -nearest neighbor distances are presented.

### Measuring Statistical Dependence with Hilbert-Schmidt Norms

- Computer Science, MathematicsALT
- 2005

We propose an independence criterion based on the eigen-spectrum of covariance operators in reproducing kernel Hilbert spaces (RKHSs), consisting of an empirical estimate of the Hilbert-Schmidt norm…

### On the Maximum Correlation Coefficient

- Mathematics
- 2005

For an arbitrary random vector $(X,Y)$ and an independent random variable Z it is shown that the maximum correlation coefficient between X and $Y+\lambda Z$ as a function of $\lambda$ is lower…

### Detecting Novel Associations in Large Data Sets

- MathematicsScience
- 2011

A measure of dependence for two-variable relationships: the maximal information coefficient (MIC), which captures a wide range of associations both functional and not, and for functional relationships provides a score that roughly equals the coefficient of determination of the data relative to the regression function.