• Corpus ID: 10581020

# Theoretical Foundations of Equitability and the Maximal Information Coefficient

@article{Reshef2014TheoreticalFO,
title={Theoretical Foundations of Equitability and the Maximal Information Coefficient},
author={Yakir A Reshef and David N. Reshef and Pardis C Sabeti and Michael Mitzenmacher},
journal={ArXiv},
year={2014},
volume={abs/1408.4908}
}
• Published 21 August 2014
• Computer Science
• ArXiv
The maximal information coecient (MIC) is a tool for nding the strongest pairwise relationships in a data set with many variables [1]. MIC is useful because it gives similar scores to equally noisy relationships of dierent types. This property, called equitability, is important for analyzing high-dimensional data sets. Here we formalize the theory behind both equitability and MIC in the language of estimation theory. This formalization has a number of advantages. First, it allows us to show…

## Figures and Tables from this paper

• Computer Science
J. Mach. Learn. Res.
• 2017
A new concept of robust-Equitability is introduced and a robust-equitable copula dependence measure is identified, the robustCopula dependence (RCD) measure, which is based on the L1-distance of the copula density from uniform and it is proved theoretically that RCD is much easier to estimate than mutual information.
• Computer Science
Scientific reports
• 2014
The results show that when run on fruit fly data set including 1,000,000 pairs of gene expression profiles, the mean squared difference between SG and the exhaustive algorithm is 0.00075499, compared with 0.1834 in the case of ApproxMaxMI.
• Medicine
• 2014
This document serves to provide some basic background and understanding of MIC as well as to address some of the questions raised about MIC in the literature, and to provide pointers to relevant supporting work.
The present thesis comprises two rather independent chapters. In general, the diagnosis and quantiﬁcation of dependence is a major aim of econometric studies. Along these lines, the concept of
• Computer Science
Bioinform.
• 2019
A new method named Clustermatch is designed to easily and efficiently perform data-mining tasks on large and highly heterogeneous datasets and is better suited for finding meaningful relationships in complex datasets.
• Computer Science
GSKI
• 2017
This paper will present the theoretical of MIC, one statistical method to measure a correlation coefficient of pairwise variables on an immense dataset, and related works.
• Computer Science
Research Anthology on Big Data Analytics, Architectures, and Applications
• 2022
Security issues in big data are reviewed and the performance of ML and DL in a critical environment is evaluated and issues and challenges of ML are focused on and their remedies are investigated.
• Computer Science
IEEE Access
• 2021
An algorithm based on greedy stepwise strategy and upper confidence bound (UCB) for an approximate calculation of MMIC, a novel and widely-using measure of association detection in large datasets that has both generality and equability.

## References

SHOWING 1-10 OF 32 REFERENCES

• Computer Science
Proceedings of the National Academy of Sciences
• 2014
It is argued that equitability is properly formalized by a self-consistency condition closely related to Data Processing Inequality, and shown that estimating mutual information provides a natural and practical method for equitably quantifying associations in large datasets.
• Computer Science, Mathematics
J. Mach. Learn. Res.
• 2016
This paper introduces and characterize a population measure of dependence called MIC*, and introduces an efficient approach for computing MIC* from the density of a pair of random variables, and defines a new consistent estimator MICe for MIC* that is efficiently computable.
• Computer Science
Proceedings of the National Academy of Sciences
• 2014
It is argued that regardless of whether “perfect” equitability is possible, approximate notions of equitability remain the right goal for many data exploration settings, and that mutual information is more equitable than MIC under a range of noise models.
• Mathematics
Statistical Science
• 2020
This work formally present and characterize equitability, a property of measures of dependence that enables fruitful analysis of data sets with a small number of strong, interesting relationships and a large number of weaker ones, and draws on the equivalence of interval estimation and hypothesis testing to draw on this property.
• Mathematics
Proceedings of the National Academy of Sciences
• 2014
It is argued that no nontrivial dependence measure can satisfy R 2 R2-equitability, and that this is the result of a poorly constructed definition.
• Computer Science
Stat. Comput.
• 2014
An iterative greedy algorithm is provided, as an alternative to the ApproxMaxMI proposed by Reshef et al., to determine the value of MIC through iterative optimization, which can be conducted parallelly.
• Computer Science
Physical review. E, Statistical, nonlinear, and soft matter physics
• 2004
Two classes of improved estimators for mutual information M(X,Y), from samples of random points distributed according to some joint probability density mu(x,y), based on entropy estimates from k -nearest neighbor distances are presented.
• Computer Science, Mathematics
ALT
• 2005
We propose an independence criterion based on the eigen-spectrum of covariance operators in reproducing kernel Hilbert spaces (RKHSs), consisting of an empirical estimate of the Hilbert-Schmidt norm
• Mathematics
• 2005
For an arbitrary random vector $(X,Y)$ and an independent random variable Z it is shown that the maximum correlation coefficient between X and $Y+\lambda Z$ as a function of $\lambda$ is lower
• Mathematics
Science
• 2011
A measure of dependence for two-variable relationships: the maximal information coefficient (MIC), which captures a wide range of associations both functional and not, and for functional relationships provides a score that roughly equals the coefficient of determination of the data relative to the regression function.