Information divergence is more χ2-distributed than the χ2-statistics

  title={Information divergence is more $\chi$2-distributed than the $\chi$2-statistics},
  author={P. Harremo{\"e}s and G. Tusn{\'a}dy},
  journal={2012 IEEE International Symposium on Information Theory Proceedings},
  • P. Harremoës, G. Tusnády
  • Published 2012
  • Mathematics, Computer Science
  • 2012 IEEE International Symposium on Information Theory Proceedings
For testing goodness of fit it is very popular to use either the χ<sup>2</sup>-statistic or G<sup>2</sup>-statistics (information divergence). Asymptotically both are χ<sup>2</sup>-distributed so an obvious question is which of the two statistics that has a distribution that is closest to the χ<sup>2</sup>-distribution. Surprisingly, when there is only one degree of freedom it seems like the distribution of information divergence is much better approximated by a χ<sup>2</sup>-distribution than… Expand
Mutual information of contingency tables and related inequalities
  • P. Harremoës
  • Mathematics, Computer Science
  • 2014 IEEE International Symposium on Information Theory
  • 2014
The signed log-likelihood is introduced and it is demonstrated that its distribution function can be related to the distribution function of a standard Gaussian by inequalities, and a general conjecture about how close the signed Log-Likelihood is to a standardGaussian is formulated. Expand
M ay 2 01 2 Some Refinements of Large Deviation Tail Probabilities
If μ is close to the mean ofX1 one would usually approximate Pn,μ by a tail probability of a Gaussian random variable. If μ is far from the mean of X1 the tail probability can be estimated usingExpand
Multinomial Concentration in Relative Entropy at the Ratio of Alphabet and Sample Sizes
We show that the moment generating function of the Kullback-Leibler divergence between the empirical distribution of $n$ independent samples from a distribution $P$ over a finite alphabet of size $k$Expand
Bayesian Learning of Relationships
This dissertation presents its efforts in developing new statistical techniques for learning the relationships inside data, from Bayesian perspective, on developing a novel hypothesis testing tool to probe a general relationship, either linear or nonlinear. Expand
How Many Samples Required in Big Data Collection: A Differential Message Importance Measure
It is proved that the change of DMIM can describe the gap between the distribution of a set of sample values and a theoretical distribution, and it is obtained that the empirical distribution approaches the real distribution with decreasing of the DMIM deviation. Expand
Bounds on tail probabilities for negative binomial distributions
Various new inequalities for tail proabilities for distributions that are elements of the most improtant exponential families, which include the Poisson distributions, the Gamma distribution, the binomial distributions,The negative binomial distribution and the inverse Gaussian distributions are presented. Expand
Bounds on Tail Probabilities in Exponential families
In this paper we present various new inequalities for tail proabilities for distributions that are elements of the most improtant exponential families. These families include the PoissonExpand
Differential Message Importance Measure: A New Approach to the Required Sampling Number in Big Data Structure Characterization
A new approach to the required sampling number is proposed, where the DMIM deviation is constructed to characterize the process of collecting message importance, and the connection between message importance and distribution goodness-of-fit is established, which verifies that analyzing the data collection with taking message importance into account is feasible. Expand
Algorithmic Fairness Revisited
We study fairness in algorithmic decision making, with the goal of proposing formal and robust definitions and measures of an algorithm’s bias towards sensitive user features. The main contributionExpand
Statistical Methods for Analyzing Time Series Data Drawn from Complex Social Systems
This thesis develops a non-parametric method for smoothing time series data corrupted by serially correlated noise, and demonstrates that user behavior, while quite complex, belies simple underlying computational structures. Expand


χ 2 and classical exact tests often wildly misreport significance ; the remedy lies in computers
If a discrete probability distribution in a model being tested for goodness-of-fit is not close to uniform, then forming the Pearson χ2 statistic can involve division by nearly zero. This often leadsExpand
On the Bahadur-Efficient Testing of Uniformity by Means of the Entropy
It is proved that the information divergence statistic in this problem is more efficient in the Bahadur sense than any power divergence statistic of order, which means that the entropy provides the most efficient way of characterizing the uniformity of a distribution. Expand
Chi-square and classical exact tests often wildly misreport significance; the remedy lies in computers
If a discrete probability distribution in a model being tested for goodness-of-fit is not close to uniform, then forming the Pearson chi-square statistic can involve division by nearly zero. ThisExpand
Large deviations of divergence measures on partitions
Abstract We discuss Chernoff-type large deviation results for the total variation, the I-divergence errors, and the χ2-divergence errors on partitions. In contrast to the total variation and theExpand
The classical problem of choice of number of classes in testing goodness of fit is considered for a class of alternatives, for the chi-square and likelihood ratio statistics. Pitman and BahadurExpand
Accurate Methods for the Statistics of Surprise and Coincidence
The basis of a measure based on likelihood ratios that can be applied to the analysis of text is described, and in cases where traditional contingency table methods work well, the likelihood ratio tests described here are nearly identical. Expand
Information-theoretic methods in testing the goodness of fit
We present a new approach to evaluating the efficiency of information-divergence-type statistics for testing the goodness of fit. Since the Pitman approach is too weak to detect sufficiently sharplyExpand
Efficiency of entropy testing
  • P. Harremoës, I. Vajda
  • Computer Science, Mathematics
  • 2008 IEEE International Symposium on Information Theory
  • 2008
It is shown that in a certain sense Shannon entropy is more efficient than Renyi entropy for alpha isin [0; 1]. Expand
Testing Statistical Hypotheses
The General Decision Problem.- The Probability Background.- Uniformly Most Powerful Tests.- Unbiasedness: Theory and First Applications.- Unbiasedness: Applications to Normal Distributions.-Expand
Limiting distribution of the G statistics
The G statistic and its local version have been used extensively in spatial data analysis. The paper proves the asymptotic normality of the G statistic. Theorems in this paper imply that the regularExpand