• Corpus ID: 88521106

Comment on "Detecting Novel Associations In Large Data Sets" by Reshef Et Al, Science Dec 16, 2011

  title={Comment on "Detecting Novel Associations In Large Data Sets" by Reshef Et Al, Science Dec 16, 2011},
  author={Noah Simon and Robert Tibshirani},
  journal={arXiv: Methodology},
The proposal of Reshef et al. (2011) is an interesting new approach for discovering non-linear dependencies among pairs of measurements in exploratory data mining. However, it has a potentially serious drawback. The authors laud the fact that MIC has no preference for some alternatives over others, but as the authors know, there is no free lunch in Statistics: tests which strive to have high power against all alternatives can have low power in many important situations. To investigate this, we… 

Figures from this paper

Detecting a Wide Diversity of Associations in Very Large Data Sets
A novel exploratory approach to finding associations in large data sets has been developed based on a bias-corrected mutual information measure that has the advantage of being able to identify non-linear as well as linear relationships.
A practical tool for maximal information coefficient analysis
Background The ability of finding complex associations in large omics datasets, assessing their significance, and prioritizing them according to their strength can be of great help in the data
Uniform Partitioning of Data Grid for Association Detection
This article introduces the uniform information coefficient (UIC), which measures the amount of dependence between two multidimensional variables and is able to detect both linear and non-linear associations.
Gene coexpression measures in large heterogeneous samples using count statistics
Two new robust count statistics to account for local patterns in gene expression profiles are proposed based on counting local patterns of gene expression ranks to take into account the potentially diverse nature of gene interactions.
A Novel Test for Independence Derived from an Exact Distribution of ith Nearest Neighbours
This work provides an exact formula for the th nearest neighbor distance distribution of rank-transformed data and proposes two novel tests for independence, concluding that no particular method is generally superior to all other methods.
Measuring and Discovering Correlations in Large Data Sets
  • Lijue Liu, Ming Li, Sha Wen
  • Computer Science
    2013 IEEE 10th International Conference on High Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing
  • 2013
A class of statistics named ART (the alternant recursive topology statistics) is proposed to measure the properties of correlation between two variables to compensate for the disadvantages of Reshef's model.
A procedure to detect general association based on concentration of ranks
RankCover is described, a new non-parametric association test of association between two variables that measures the concentration of paired ranked points that is robust and often powerful in comparison to competing general association tests.
Detecting direct associations in a network by information theoretic approaches
The traditional approaches or measurements on the associations among the observed variables, such as correlation coefficient, mutual information and conditional mutual information (CMI), are reviewed, and recently developed theories and methods are summarized.
Detecting Trivariate Associations in High-Dimensional Datasets
QOTIC equitably measures dependence among three variables and exceeds existing methods in generality and equitability as QOTIC has general test functions and is applicable in detecting multivariable correlation in datasets of various sample sizes and noise levels.
independence: Fast Rank Tests
In all circumstances under which the classical Hoeffding independence test is applicable, the R package, independence, offers a highly optimized implementation of these rank-based tests that provide novel competitive algorithms for consistent testing against all alternatives.


Detecting Novel Associations in Large Data Sets
A measure of dependence for two-variable relationships: the maximal information coefficient (MIC), which captures a wide range of associations both functional and not, and for functional relationships provides a score that roughly equals the coefficient of determination of the data relative to the regression function.
Brownian distance covariance
Distance correlation is a new class of multivariate dependence coefficients applicable to random vectors of arbitrary and not necessarily equal dimension. Distance covariance and distance correlation