Learn More
Identifying interesting relationships between pairs of variables in large data sets is increasingly important. Here, we present a measure of dependence for two-variable relationships: the maximal information coefficient (MIC). MIC captures a wide range of associations both functional and not, and for functional relationships provides a score that roughly(More)
A measure of dependence is said to be equitable if it gives similar scores to equally noisy relationships of different types. Equitability is important in data exploration when the goal is to identify a relatively small set of strongest associations within a dataset as opposed to finding as many non-zero associations as possible, which often are too many to(More)
As data sets grow in dimensionality, non-parametric measures of dependence have seen increasing use in data exploration due to their ability to identify non-trivial relationships of all kinds. One common use of these tools is to test a null hypothesis of statistical independence on all variable pairs in a data set. However, because this approach attempts to(More)
BACKGROUND During an influenza pandemic, a substantial proportion of transmission is thought to occur in households. We used data on influenza progression in individuals and their contacts collected by the City of Milwaukee Health Department (MHD) to study the transmission of pandemic influenza A/H1N1 virus in 362 households in Milwaukee, WI, and the(More)
Although we appreciate Kinney and Atwal's interest in equitability and maximal information coefficient (MIC), we believe they misrepresent our work. We highlight a few of our main objections below. Regarding our original paper (1), Kinney and Atwal (2) state " MIC is said to satisfy not just the heuristic notion of equitability, but also the mathematical(More)
In exploratory data analysis, we are often interested in identifying promising pairwise associations for further analysis while filtering out weaker, less interesting ones. This can be accomplished by computing a measure of dependence on all possible variable pairs and examining the highest-scoring pairs, provided the measure of dependence used assigns(More)
Using data from the Gonococcal Isolate Surveillance Project, we studied changes in ciprofloxacin resistance in Neisseria gonorrhoeae isolates in the United States during 2002-2007. Compared with prevalence in heterosexual men, prevalence of ciprofloxacin-resistant N. gonorrhoeae infections showed a more pronounced increase in men who have sex with men(More)
How do we perceive the predictability of functions? We derive a rational measure of a function's predictability based on Gaussian process learning curves. Using this measure, we show that the smoothness of a function can be more important to predictability judgments than the variance of additive noise or the number of samples. These patterns can be captured(More)
For high-dimensional data sets, it is common to evaluate a measure of dependence on every variable pair and retain the highest-scoring pairs for follow-up. If the statistic used systematically assigns higher scores to some relationship types (e.g., linear, exponential, etc.) over others, important relationships may be overlooked because of their type. This(More)
The maximal information coefficient (MIC) is a tool for finding the strongest pairwise relationships in a data set with many variables [1]. MIC is useful because it gives similar scores to equally noisy relationships of different types. This property, called equitability, is important for analyzing high-dimensional data sets. Here we formalize the theory(More)