Information Measures: The Curious Case of the Binary Alphabet

@article{Jiao2014InformationMT,
  title={Information Measures: The Curious Case of the Binary Alphabet},
  author={Jiantao Jiao and Thomas A. Courtade and Albert No and Kartik Venkat and Tsachy Weissman},
  journal={IEEE Transactions on Information Theory},
  year={2014},
  volume={60},
  pages={7616-7626}
}
Four problems related to information divergence measures defined on finite alphabets are considered. In three of the cases we consider, we illustrate a contrast that arises between the binary-alphabet and larger alphabet settings. This is surprising in some instances, since characterizations for the larger alphabet settings do not generalize their binary-alphabet counterparts. In particular, we show that f-divergences are not the unique decomposable divergences on binary alphabets that satisfy… 
Information divergences and the curious case of the binary alphabet
TLDR
It is shown that f-divergences are not the unique decomposable divergences on binary alphabets that satisfy the data processing inequality, despite contrary claims in the literature.
Bregman Divergence Bounds and Universality Properties of the Logarithmic Loss
TLDR
It is shown that for binary classification, the divergence associated with smooth, proper, and convex loss functions is upper bounded by the Kullback-Leibler (KL) divergence, to within a normalization constant, which implies that the logarithmic loss is universal in the sense of providing performance guarantees with respect to a broad class of accuracy measures.
On a generalization of the Jensen-Shannon divergence and the JS-symmetrization of distances relying on abstract means
TLDR
This work presents a generalization of the Jensen-Shannon (JS) divergence using abstract means which yields closed-form expressions when the mean is chosen according to the parametric family of distributions.
Bregman Divergence Bounds and the Universality of the Logarithmic Loss
TLDR
This work shows that for binary classification, the divergence associated with smooth, proper and convex loss functions is bounded from above by the Kullback-Leibler (KL) divergence, up to a normalization constant, which suggests that the log-loss is universal in the sense that it provides performance guarantees to a broad class of accuracy measures.
Entropy on Spin Factors.
Recently it has been demonstrated that the Shannon entropy or the von Neuman entropy are the only entropy functions that generate a local Bregman divergences as long as the state space has rank 3 or
Entropy on Spin Factors
Recently it has been demonstrated that the Shannon entropy or the von Neuman entropy are the only entropy functions that generate a local Bregman divergences as long as the state space has rank 3 or
On w-mixtures: Finite convex combinations of prescribed component distributions
TLDR
It is shown how the Kullback-Leibler (KL) divergence can be recovered from the corresponding Bregman divergence for the negentropy generator and proved that the statistical skew Jensen-Shannon divergence between $w-mixtures is equivalent to a skew Jensen divergence between their corresponding parameters.
Justification of Logarithmic Loss via the Benefit of Side Information
TLDR
This work provides a new characterization of mutual information, and justifies its use as a measure of relevance, and extends the results naturally extend to measuring the causal influence between stochastic processes.
Introducing Information Measures via Inference [Lecture Notes]
  • O. Simeone
  • Mathematics, Computer Science
    IEEE Signal Processing Magazine
  • 2018
TLDR
The goal of this lecture note is to describe a principled and intuitive introduction to information measures that builds on inference, i.e., estimation and hypothesis testing.
03 22 2 v 2 [ m at hph ] 4 M ay 2 01 8 Entropy of Spin Factors
Recently it has been demonstrated that the Shannon entropy or the von Neuman entropy are the only entropy functions that generate a local Bregman divergences as long as the state space has rank 3 or
...
1
2
3
4
...

References

SHOWING 1-10 OF 34 REFERENCES
$\alpha$ -Divergence Is Unique, Belonging to Both $f$-Divergence and Bregman Divergence Classes
  • S. Amari
  • Computer Science, Mathematics
    IEEE Transactions on Information Theory
  • 2009
TLDR
It is proved that the alpha-divergences constitute a unique class belonging to both classes when the space of positive measures or positive arrays is considered, and this is the only such one in thespace of probability distributions.
Determination of all semisymmetric recursive information measures of multiplicative type on n positive discrete probability distributions
Information measures Δm (entropies, divergences, inaccuracies, information improvements, etc.), depending upon n probability distributions which we unite into a vector distribution, are recursive of
Arimoto channel coding converse and Rényi divergence
  • Yury Polyanskiy, S. Verdú
  • Mathematics
    2010 48th Annual Allerton Conference on Communication, Control, and Computing (Allerton)
  • 2010
Arimoto [1] proved a non-asymptotic upper bound on the probability of successful decoding achievable by any code on a given discrete memoryless channel. In this paper we present a simple derivation
The Information Bottleneck Revisited or How to Choose a Good Distortion Measure
TLDR
It is shown that the information bottleneck method has some properties that are not shared with rate distortion theory based on any other divergence measure, which makes it unique.
Strictly Proper Scoring Rules, Prediction, and Estimation
Scoring rules assess the quality of probabilistic forecasts, by assigning a numerical score based on the predictive distribution and on the event or value that materializes. A scoring rule is proper
α-divergence is unique, belonging to both f-divergence and Bregman divergence classes
A divergence measure between two probability distributions or positive arrays (positive measures) is a useful tool for solving optimization problems in optimization, signal processing, machine lear...
About distances of discrete distributions satisfying the data processing theorem of information theory
TLDR
N necessary and sufficient conditions for validity of the data processing theorem of information theory are established and the Burbea-Rao (1982) divergences and Bregman (1967) distances are established.
Markov Processes and the H -Theorem
The H -theorem is investigated in view of Markov processes. The proof is valid even in the fields other than physics, since none of physical relations, such as the principle of microscopic
On Information and Sufficiency
The information deviation between any two finite measures cannot be increased by any statistical operations (Markov morphisms). It is invarient if and only if the morphism is sufficient for these two
On the inequality Σpif(pi)≥ pif(qi)
ZusammenfassungWir betrachten die Ungleichung (2) und beweisen die folgenden Ergebnisse. Die allgemeine Lösung der Ungleichung (2) ist monoton wachsend. Die allgemeine Lösung der Ungleichung (2) ist
...
1
2
3
4
...