Inferring a Property of a Large System from a Small Number of Samples

  title={Inferring a Property of a Large System from a Small Number of Samples},
  author={Dami{\'a}n G. Hern{\'a}ndez and In{\'e}s Samengo},
Inferring the value of a property of a large stochastic system is a difficult task when the number of samples is insufficient to reliably estimate the probability distribution. The Bayesian estimator of the property of interest requires the knowledge of the prior distribution, and in many situations, it is not clear which prior should be used. Several estimators have been developed so far in which the proposed prior us individually tailored for each property of interest; such is the case, for… 

Figures from this paper

Low probability states, data statistics, and entropy estimation

This work shows that well-known entropy estimators for probability distributions on discrete state spaces model the structure of the low probability tail based largely on few statistics of the data: the sample size, the Maximum Likelihood estimate, the number of coincidences among the samples, the dispersion of the coincidences.



Estimating the Mutual Information between Two Discrete, Asymmetric Variables with Limited Samples

A consistent estimator is obtained that presents very low bias, outperforming previous methods even when the sampled data contain few coincidences, and is applicable to those cases in which the marginal distribution of one of the two variables—the one with minimal entropy—is well sampled.

Estimating probabilities from experimental frequencies.

  • I. Samengo
  • Mathematics
    Physical review. E, Statistical, nonlinear, and soft matter physics
  • 2002
In this work, the probability that the true distribution be q, given that the frequency count f was sampled, is studied and a thermodynamic potential, which allows an easy evaluation of the mean Kullback-Leibler divergence between the true and measured distribution, is defined.

Bayesian and Quasi-Bayesian Estimators for Mutual Information from Discrete Data

This work discusses several regularized estimators for MI that employ priors based on the Dirichlet distribution, and examines the performance of these estimators with a variety of simulated datasets and shows that, surprisingly, quasi-Bayesian estimators generally outperform the authors' Bayesian estimator.

Bayesian entropy estimation for countable discrete distributions

This work considers the problem of estimating Shannon's entropy H from discrete data, in cases where the number of possible symbols is unknown or even countably infinite, and derives a family of continuous measures for mixing Pitman-Yor processes to produce an approximately flat prior over H.

A simple method for estimating the entropy of neural activity

This work proposes a simple scheme for estimating the entropy in the undersampled regime, which bounds its value from both below and above, and applies it to actual measurements of neural activity in populations with up to 100 cells.

Information Estimation Using Non-Parametric Copulas

The non-parametric copula information estimator will be a powerful tool in estimating mutual information between a broad range of data and provides a good balance between general applicability to arbitrarily shaped statistical dependencies in the data and shows accurate and robust performance when working with small sample sizes.

Estimation of Entropy and Mutual Information

  • L. Paninski
  • Mathematics, Computer Science
    Neural Computation
  • 2003
An exact local expansion of the entropy function is used to prove almost sure consistency and central limit theorems for three of the most commonly used discretized information estimators, and leads to an estimator with some nice properties: the estimator comes equipped with rigorous bounds on the maximum error over all possible underlying probability distributions, and this maximum error turns out to be surprisingly small.

Estimating mutual information.

Two classes of improved estimators for mutual information M(X,Y), from samples of random points distributed according to some joint probability density mu(x,y), based on entropy estimates from k -nearest neighbor distances are presented.

Tight Data-Robust Bounds to Mutual Information Combining Shuffling and Model Selection Techniques

This application shows that even in the presence of strong correlations, the methods constrain precisely the amount of information encoded by real spike trains recorded in vivo, which can provide data-robust upper and lower bounds to the mutual information.

Statistical physics of inference: thresholds and algorithms

The connection between inference and statistical physics is currently witnessing an impressive renaissance and the current state-of-the-art is reviewed, with a pedagogical focus on the Ising model which, formulated as an inference problem, is called the planted spin glass.