Scalable Mutual Information Estimation Using Dependence Graphs

@article{Noshad2019ScalableMI,
  title={Scalable Mutual Information Estimation Using Dependence Graphs},
  author={Morteza Noshad and Alfred O. Hero},
  journal={ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  year={2019},
  pages={2962-2966}
}
  • M. NoshadA. Hero
  • Published 27 January 2018
  • Computer Science
  • ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
The Mutual Information (MI) is an often used measure of dependency between two random variables utilized in informa- tion theory, statistics and machine learning. [] Key Method We propose a unified method for empirical non-parametric estimation of general MI function between random vectors in ${\mathbb{R}^d}$ based on $N$ i.i.d. samples.

Figures from this paper

Estimation of Information Measures and Its Applications in Machine Learning

This thesis proves that the average of an appropriate function of density ratio estimates over all of the points converges to the divergence or mutual information measures.

Geometric Estimation of Multivariate Dependency

This paper proposes a geometric estimator of dependency between a pair of multivariate random variables based on a randomly permuted geometric graph (the minimal spanning tree) over the two multivariate samples that converges to a quantity that is equivalent to the Henze–Penrose divergence.

Sliced Mutual Information: A Scalable Measure of Statistical Dependence

This paper proposes sliced MI (SMI) as a surrogate measure of dependence as an average of MI terms between one-dimensional random projections, and shows that it preserves many of the structural properties of classic MI, while gaining scalable computation andcient estimation from samples.

Inductive Mutual Information Estimation: A Convex Maximum-Entropy Copula Approach

This work proposes a novel estimator of the mutual information between two ordinal vectors that is marginal-invariant, always non-negative, unbounded for any sample size $n, consistent, has MSE rate $O(1/n)$, and is more data-efficient than competing approaches.

Revisiting Probability Distribution Assumptions for Information Theoretic Feature Selection

Two sets of distribution assumptions underlying many MI and VI based methods: Feature Independence Distribution and Geometric Mean Distribution are revealed and a logical extension called Arithmetic Mean Distribution is proposed, which leads to an unbiased and normalised estimation of probability densities.

Estimating Probability Distributions and their Properties

The derivation of minimax convergence rates is considered, which may help explain why these tools appear to perform well at problems that are intractable from traditionalperspectives of nonparametric statistics.

Uniform Partitioning of Data Grid for Association Detection

This article introduces the uniform information coefficient (UIC), which measures the amount of dependence between two multidimensional variables and is able to detect both linear and non-linear associations.

Proposal Estimating Probability Distributions and their Properties

This thesis considers a large, novel class of losses, under which high-dimensional nonparametric distribution estimation is more tractable than under the usual L2 loss, helping to explain why these methods appear to perform well at problems that are intractable from traditional perspectives ofnonparametric statistics.

Information Bottleneck Analysis by a Conditional Mutual Information Bound

It is demonstrated that conditional mutual information I(z;x|y) provides an alternative upper bound for I( z;n), and this bound is applicable even if z is not a sufficient representation of x, that is, I(Z;y)≠I(x;y).

Diffeomorphic Information Neural Estimation

This study introduces DINE–a novel approach for estimating CMI of continuous random variables, inspired by the invariance of CMI over diffeomorphic maps, and shows that the variables of interest can be re-placed with appropriate surrogates that follow simpler distributions, allowing the CMI to be evaluated via analytical solutions.

References

SHOWING 1-10 OF 26 REFERENCES

Estimating Mutual Information for Discrete-Continuous Mixtures

This paper designs a novel estimator for mutual information of discrete-continuous mixtures and proves that the proposed estimator is consistent, and provides numerical experiments suggesting superiority of this estimator compared to other heuristics.

Relative performance of mutual information estimation methods for quantifying the dependence among short and noisy data.

In situations where the approximate data sizes are known in advance and exploratory data analysis and/or domain knowledge can be used to provide a priori insights into the noise-to-signal ratios, the results in the paper point to a way forward for automating the process of MI estimation.

Scalable Hash-Based Estimation of Divergence Measures

  • M. NoshadA. Hero
  • Computer Science, Mathematics
    2018 Information Theory and Applications Workshop (ITA)
  • 2018
To the best of the knowledge, this is the first empirical divergence estimator that has optimal computational complexity and achieves the optimal parametric MSE estimation rate.

Direct estimation of information divergence using nearest neighbor ratios

This work proposes a direct estimation method for Rényi and f-divergence measures based on a new graph theoretical interpretation, and derives an ensemble estimator that achieves the parametric MSE rate of O(1/N).

Ensemble estimation of mutual information

This work derives the mean squared error convergence rates of kernel density-based plug-in estimators of mutual information measures between two multidimensional random variables X and Y and proposes an ensemble estimator of these information measures for the second case, which achieves the 1 /N parametric convergence rate.

Estimating mutual information.

Two classes of improved estimators for mutual information M(X,Y), from samples of random points distributed according to some joint probability density mu(x,y), based on entropy estimates from k -nearest neighbor distances are presented.

Estimation of Entropy and Mutual Information

  • L. Paninski
  • Mathematics, Computer Science
    Neural Computation
  • 2003
An exact local expansion of the entropy function is used to prove almost sure consistency and central limit theorems for three of the most commonly used discretized information estimators, and leads to an estimator with some nice properties: the estimator comes equipped with rigorous bounds on the maximum error over all possible underlying probability distributions, and this maximum error turns out to be surprisingly small.

Fast kNN Graph Construction with Locality Sensitive Hashing

This paper proposes an efficient algorithm for approximating kNN graphs, which has the time complexity of O(l(d+logn)n) only (d is the dimensionality and l is usually a small number) and is much faster than most existing fast methods.

Exponential Concentration of a Density Functional Estimator

This work analyzes a plug-in estimator for a large class of integral functionals of one or more continuous probability densities and proves the estimator is exponentially concentrated about its mean, whereas most previous related results have proven only expected error bounds on estimators.

Nonparametric von Mises Estimators for Entropies, Divergences and Mutual Informations

These estimators are derived from the von Mises expansion and are based on the theory of influence functions, which appear in the semiparametric statistics literature and it is shown that estimators based either on data-splitting or a leave-one-out technique enjoy fast rates of convergence and other favorable theoretical properties.