• Corpus ID: 239885712

Estimating Mutual Information via Geodesic kNN

  title={Estimating Mutual Information via Geodesic kNN},
  author={Alexander Marx and Jonas Fischer},
Estimating mutual information (MI) between two continuous random variables X and Y allows to capture non-linear dependencies between them, non-parametrically. As such, MI estimation lies at the core of many data science applications. Yet, robustly estimating MI for high-dimensional X and Y is still an open research question. In this paper, we formulate this problem through the lens of manifold learning. That is, we leverage the common assumption that the information of X and Y is captured by a… 

Figures from this paper


Algorithms for manifold learning
The motivation, background, and algorithms proposed for manifold learning are discussed and Isomap, Locally Linear Embedding, Laplacian Eigenmaps, Semidefinite Embeddings, and a host of variants of these algorithms are examined.
Random Forests
  • L. Breiman
  • Mathematics, Computer Science
    Machine Learning
  • 2004
Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the forest, and are also applicable to regression.
Geodesic Forests
Fast-BIC, a fast Bayesian Information Criterion statistic for Gaussian mixture models, is developed and demonstrated that GF is robust to high-dimensional noise, whereas other methods, such as Isomap, UMAP, and FLANN, quickly deteriorate in such settings.
Estimating mutual information.
Two classes of improved estimators for mutual information M(X,Y), from samples of random points distributed according to some joint probability density mu(x,y), based on entropy estimates from k -nearest neighbor distances are presented.
Geometric k-nearest neighbor estimation of entropy and mutual information
A series of numerical examples suggest that local geometry is a source of problems for knn methods such as the Kraskov-Stögbauer-Grassberger estimator when local geometric effects cannot be removed by global preprocessing of the data.
Density functional estimators with k-nearest neighbor bandwidths
This work proposes a novel estimator based on local likelihood density estimators, that mitigates the boundary biases and provides a simple debiasing scheme that precomputes the asymptotic bias and divides off this term.
Partial mutual information for coupling analysis of multivariate time series.
We propose a method to discover couplings in multivariate time series, based on partial mutual information, an information-theoretic generalization of partial correlation. It represents the part of
Sample estimate of the entropy of a random vector
  • Probl. Peredachi Inf., vol. 23, pp. 9–16, 1987.
  • 1987
Estimating Conditional Mutual Information for Discrete-Continuous Mixtures using Multi-Dimensional Adaptive Histograms
CMI for such mixture variables, defined based on the Radon-Nikodym derivate, can be written as a sum of entropies, just like CMI for purely discrete or continuous data, by learning an adaptive histogram model.
Discovering Functional Dependencies from Mixed-Type Data
This paper analyzes fundamental questions and derives formal criteria as to when a discretization process applied to a mixed set of random variables leads to consistent estimates of mutual information, and derives an estimator framework applicable to any task that involves estimating mutual information from multivariate and mixed-type data.