# Information Measures: The Curious Case of the Binary Alphabet

@article{Jiao2014InformationMT, title={Information Measures: The Curious Case of the Binary Alphabet}, author={Jiantao Jiao and Thomas A. Courtade and Albert No and Kartik Venkat and Tsachy Weissman}, journal={IEEE Transactions on Information Theory}, year={2014}, volume={60}, pages={7616-7626} }

Four problems related to information divergence measures defined on finite alphabets are considered. In three of the cases we consider, we illustrate a contrast that arises between the binary-alphabet and larger alphabet settings. This is surprising in some instances, since characterizations for the larger alphabet settings do not generalize their binary-alphabet counterparts. In particular, we show that f-divergences are not the unique decomposable divergences on binary alphabets that satisfy…

#### Topics from this paper

#### 32 Citations

Information divergences and the curious case of the binary alphabet

- Mathematics, Computer Science2014 IEEE International Symposium on Information Theory
- 2014

It is shown that f-divergences are not the unique decomposable divergences on binary alphabets that satisfy the data processing inequality, despite contrary claims in the literature.

Bregman Divergence Bounds and Universality Properties of the Logarithmic Loss

- Computer Science, MathematicsIEEE Transactions on Information Theory
- 2020

It is shown that for binary classification, the divergence associated with smooth, proper, and convex loss functions is upper bounded by the Kullback-Leibler (KL) divergence, to within a normalization constant, which implies that the logarithmic loss is universal in the sense of providing performance guarantees with respect to a broad class of accuracy measures.

On a generalization of the Jensen-Shannon divergence and the JS-symmetrization of distances relying on abstract means

- Computer Science, MathematicsArXiv
- 2019

This work presents a generalization of the Jensen-Shannon (JS) divergence using abstract means which yields closed-form expressions when the mean is chosen according to the parametric family of distributions.

Bregman Divergence Bounds and the Universality of the Logarithmic Loss

- Mathematics, Computer ScienceArXiv
- 2018

This work shows that for binary classification, the divergence associated with smooth, proper and convex loss functions is bounded from above by the Kullback-Leibler (KL) divergence, up to a normalization constant, which suggests that the log-loss is universal in the sense that it provides performance guarantees to a broad class of accuracy measures.

Entropy on Spin Factors.

- Mathematics
- 2016

Recently it has been demonstrated that the Shannon entropy or the von Neuman entropy are the only entropy functions that generate a local Bregman divergences as long as the state space has rank 3 or…

Entropy on Spin Factors

- Physics, Mathematics
- 2017

Recently it has been demonstrated that the Shannon entropy or the von Neuman entropy are the only entropy functions that generate a local Bregman divergences as long as the state space has rank 3 or…

On w-mixtures: Finite convex combinations of prescribed component distributions

- Computer ScienceArXiv
- 2017

It is shown how the Kullback-Leibler (KL) divergence can be recovered from the corresponding Bregman divergence for the negentropy generator and proved that the statistical skew Jensen-Shannon divergence between $w-mixtures is equivalent to a skew Jensen divergence between their corresponding parameters.

Justification of Logarithmic Loss via the Benefit of Side Information

- Computer Science, MathematicsIEEE Transactions on Information Theory
- 2014

This work provides a new characterization of mutual information, and justifies its use as a measure of relevance, and extends the results naturally extend to measuring the causal influence between stochastic processes.

Introducing Information Measures via Inference [Lecture Notes]

- Mathematics, Computer ScienceIEEE Signal Processing Magazine
- 2018

The goal of this lecture note is to describe a principled and intuitive introduction to information measures that builds on inference, i.e., estimation and hypothesis testing.

03 22 2 v 2 [ m at hph ] 4 M ay 2 01 8 Entropy of Spin Factors

- 2018

Recently it has been demonstrated that the Shannon entropy or the von Neuman entropy are the only entropy functions that generate a local Bregman divergences as long as the state space has rank 3 or…

#### References

SHOWING 1-10 OF 34 REFERENCES

$\alpha$ -Divergence Is Unique, Belonging to Both $f$-Divergence and Bregman Divergence Classes

- Computer Science, MathematicsIEEE Transactions on Information Theory
- 2009

It is proved that the alpha-divergences constitute a unique class belonging to both classes when the space of positive measures or positive arrays is considered, and this is the only such one in thespace of probability distributions.

Determination of all semisymmetric recursive information measures of multiplicative type on n positive discrete probability distributions

- Mathematics
- 1983

Information measures Δm (entropies, divergences, inaccuracies, information improvements, etc.), depending upon n probability distributions which we unite into a vector distribution, are recursive of…

Arimoto channel coding converse and Rényi divergence

- Mathematics2010 48th Annual Allerton Conference on Communication, Control, and Computing (Allerton)
- 2010

Arimoto [1] proved a non-asymptotic upper bound on the probability of successful decoding achievable by any code on a given discrete memoryless channel. In this paper we present a simple derivation…

The Information Bottleneck Revisited or How to Choose a Good Distortion Measure

- Mathematics, Computer Science2007 IEEE International Symposium on Information Theory
- 2007

It is shown that the information bottleneck method has some properties that are not shared with rate distortion theory based on any other divergence measure, which makes it unique.

Strictly Proper Scoring Rules, Prediction, and Estimation

- Mathematics
- 2007

Scoring rules assess the quality of probabilistic forecasts, by assigning a numerical score based on the predictive distribution and on the event or value that materializes. A scoring rule is proper…

α-divergence is unique, belonging to both f-divergence and Bregman divergence classes

- Mathematics
- 2009

A divergence measure between two probability distributions or positive arrays (positive measures) is a useful tool for solving optimization problems in optimization, signal processing, machine lear...

About distances of discrete distributions satisfying the data processing theorem of information theory

- Mathematics, Computer ScienceIEEE Trans. Inf. Theory
- 1997

N necessary and sufficient conditions for validity of the data processing theorem of information theory are established and the Burbea-Rao (1982) divergences and Bregman (1967) distances are established.

Markov Processes and the H -Theorem

- Mathematics
- 1963

The H -theorem is investigated in view of Markov processes. The proof is valid even in the fields other than physics, since none of physical relations, such as the principle of microscopic…

On Information and Sufficiency

- Mathematics
- 1997

The information deviation between any two finite measures cannot be increased by any statistical operations (Markov morphisms). It is invarient if and only if the morphism is sufficient for these two…

On the inequality Σpif(pi)≥ pif(qi)

- Mathematics
- 1972

ZusammenfassungWir betrachten die Ungleichung (2) und beweisen die folgenden Ergebnisse. Die allgemeine Lösung der Ungleichung (2) ist monoton wachsend. Die allgemeine Lösung der Ungleichung (2) ist…