Entropy Inference and the James-Stein Estimator, with Application to Nonlinear Gene Association Networks

  title={Entropy Inference and the James-Stein Estimator, with Application to Nonlinear Gene Association Networks},
  author={Jean Hausser and Korbinian Strimmer},
  journal={J. Mach. Learn. Res.},
We present a procedure for effective estimation of entropy and mutual information from small-sample data, and apply it to the problem of inferring high-dimensional gene association networks. Specifically, we develop a James-Stein-type shrinkage estimator, resulting in a procedure that is highly efficient statistically as well as computationally. Despite its simplicity, we show that it outperforms eight other entropy estimation procedures across a diverse range of sampling scenarios and data… 

Figures and Tables from this paper

Influence of Statistical Estimators of Mutual Information and Data Heterogeneity on the Inference of Gene Regulatory Networks

This study investigates the influence of discrete mutual information (MI) estimators on the global and local network inference performance of the C3NET algorithm and provides guidance for an efficient design of a simulation study in the context of network inference, supporting a systems approach.

A comprehensive comparison of association estimators for gene network inference algorithms

B-spline, Pearson- based Gaussian and Spearman-based Gaussian association score estimators outperform the others for all datasets in terms of the performance and runtime and it is observed that, when the CT operation is used, inference performances of the estimators mostly increase, especially for two synthetic datasets.

Mutual information estimation for transcriptional regulatory network inference

CL was found to be the best performing inference algorithm, corroborating previous results indicating that it is the state of the art mutual inference algorithm and the effect of discretisation parameters are studied in detail.

Hierarchical estimation of parameters in Bayesian networks

Graph Estimation with Joint Additive Models.

This work proposes a semi-parametric method, graph estimation with joint additive models, which allows the conditional means of the features to take on an arbitrary additive form, and shows that it performs better than existing methods when there are non-linear relationships among the features, and is comparable to methods that assume multivariate normality when the conditional Means are linear.

Entropy and the species accumulation curve: a novel entropy estimator via discovery rates of new species

It is shown that there is a close relationship between Shannon entropy and the species accumulation curve, which depicts the cumulative number of observed species as a function of sample size, and the resulting entropy estimator is nearly unbiased.

Improved mean estimation and its application to diagonal discriminant analysis

This article investigates the family of shrinkage estimators for the mean value under the quadratic loss function and proposes a shrinkage-based diagonal discriminant rule, which outperforms its original competitor in a wide range of settings.

Adaptive input data transformation for improved network reconstruction with information theoretic algorithms.

The nature and properties of the inevitable bias is described, and an adaptive partitioning scheme for MI estimation that effectively transforms the sample data using parameters determined from its local and global distribution guaranteeing a more robust and reliable reconstruction algorithm is proposed.

Dirichlet Bayesian Network Scores and the Maximum Entropy Principle

It is shown how the Bayesian Dirichlet equivalent uniform (BDeu) may violate the maximum entropy principle when applied to sparse data and how it may also be problematic from a Bayesian model selection perspective.

Does dirichlet prior smoothing solve the Shannon entropy estimation problem?

The theory of approximation is harnessed using positive linear operators for analyzing the bias of plug-in estimators for general functionals under arbitrary statistical models, thereby further consolidating the interplay between these two fields.



A Shrinkage Approach to Large-Scale Covariance Matrix Estimation and Implications for Functional Genomics

This work proposes a novel shrinkage covariance estimator that exploits the Ledoit-Wolf (2003) lemma for analytic calculation of the optimal shrinkage intensity and applies it to the problem of inferring large-scale gene association networks.

Accurate Ranking of Differentially Expressed Genes by a Distribution-Free Shrinkage Approach

The “shrinkage t” statistic is introduced, a novel and model-free shrinkage estimate of the variance vector across genes that is derived in a quasi-empirical Bayes setting and consistently leads to highly accurate rankings.

An empirical Bayes approach to inferring large-scale gene association networks

A novel framework for small-sample inference of graphical models from gene expression data that focuses on the so-called graphical Gaussian models (GGMs) that are now frequently used to describe gene association networks and to detect conditionally dependent genes is introduced.

From correlation to causation networks: a simple approximate learning algorithm and its application to high-dimensional plant gene expression data

A heuristic for the statistical learning of a high-dimensional "causal" network that not only yield sensible first order approximations of the causal structure in high- dimensional genomic data but is also computationally highly efficient.

Bayes' estimators of generalized entropies

This paper uses the functional relationship between and to use the Bayes estimator of the order-q Tsallis entropy to estimate the Renyi entropy, and compares these novel estimators with the frequency-count estimators for and.

Information-Theoretic Inference of Large Transcriptional Regulatory Networks

MRNET is assessed by benchmarking it against RELNET, CLR, and ARACNE, three state-of-the-art information-theoretic methods for large (up to several thousands of genes) network inference and results show that MRNET is competitive with these methods.

Coverage‐adjusted entropy estimation

It is proved that the coverage-adjusted entropy estimator (CAE), due to Chao and Shen, is consistent and first-order optimal, with rate O(P)(1/log n), in the class of distributions with finite entropy variance and that the Good-Turing coverage estimate and the total probability of unobserved words converge at rate O("1/(log n)(q").

Simultaneous Estimation of Multinomial Cell Probabilities

Abstract A new estimator, p*, of the multinomial parameter vector is proposed, and it is shown to be a better choice in most situations than the usual estimator, (the vector of observed proportions).

High-dimensional graphs and variable selection with the Lasso

It is shown that neighborhood selection with the Lasso is a computationally attractive alternative to standard covariance selection for sparse high-dimensional graphs and is hence equivalent to variable selection for Gaussian linear models.