# Notes on Kullback-Leibler Divergence and Likelihood

@article{Shlens2014NotesOK, title={Notes on Kullback-Leibler Divergence and Likelihood}, author={Jonathon Shlens}, journal={ArXiv}, year={2014}, volume={abs/1404.2000} }

The Kullback-Leibler (KL) divergence is a fundamental equation of information theory that quantifies the proximity of two probability distributions. Although difficult to understand by examining the equation, an intuition and understanding of the KL divergence arises from its intimate relationship with likelihood theory. We discuss how KL divergence arises from likelihood theory in an attempt to provide some intuition and reserve a rigorous (but rather simple) derivation for the appendix…

## 51 Citations

Large Sample Asymptotic for Nonparametric Mixture Model with Count Data

- Computer Science, Mathematics
- 2015

The Large Sample Asymptotic for count data when the number of sample in Multinomial distribution goes to infinity is presented and the similar result to SVA for scalable clustering is derived.

An Introduction to Variational Inference

- Computer ScienceArXiv
- 2021

This paper introduces the concept of Variational Inference (VI), a popular method in machine learning that uses optimization techniques to estimate complex probability densities and discusses the applications of VI to variational auto-encoders and VAE-Generative Adversarial Network.

Optimization Models of Natural Communication

- Computer ScienceJ. Quant. Linguistics
- 2018

Two important components of the family, namely the information theoretic principles and the energy function that combines them linearly, are reviewed from the perspective of psycholinguistics, language learning, information theory and synergetic linguistics.

Entropy and geometry of quantum states

- PhysicsInternational Journal of Quantum Information
- 2018

We compare the roles of the Bures–Helstrom (BH) and Bogoliubov–Kubo–Mori (BKM) metrics in the subject of quantum information geometry. We note that there are two limits involved in state…

The shape of terrestrial abundance distributions

- Environmental ScienceScience Advances
- 2015

A new, low-dominance distribution of terrestrial abundance: the double geometric, which assumes both that richness is finite and that species compete unequally for resources in a two-dimensional niche landscape, which implies that niche breadths are variable and that trait distributions are neither arrayed along a single dimension nor randomly associated.

Bayesian Root Cause Analysis by Separable Likelihoods

- Computer ScienceSOFSEM
- 2019

This paper proposes a framework for simple and friendly RCA within the Bayesian regime under certain restrictions (namely that Hessian at the mode is diagonal, in this work referred to as separability) imposed on the predictive posterior.

Wasserstein Distance Guided Cross-Domain Learning

- Computer ScienceArXiv
- 2019

This work proposes a new approach to infer the joint distribution of images from different distributions, namely Wasserstein Distance Guided Cross-Domain Learning (WDGCDL), which applies theWasserstein distance to estimate the divergence between the source and target distribution which provides good gradient property and promising generalisation bound.

Piecewise constant nonnegative matrix factorization

- Computer Science2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2014

This paper proposes a non-negative matrix factorization (NMF) model with piecewise-constant activation coefficients that is enforced using a total variation penalty on the rows of the activation matrix and uses it to solve a video structuring problem that involves both segmentation and clustering tasks.

Non-uniform Quantized Distributed Sensing in Practical Wireless Rayleigh Fading Channel

- Computer ScienceCrownCom
- 2015

The effect of the collaboration in the CPAC scheme on performance of the distributed sensing compared with non-cooperative scheme is investigated and the sensitivity of the provided quantization scheme to average error probability of symbols is illustrated.

Interpretable Convolution Methods for Learning Genomic Sequence Motifs

- Computer Science, BiologybioRxiv
- 2018

This work proposes a schema to learn sequence motifs directly through weight constraints and transformations such that the individual weights comprising the filter are directly interpretable as either position weight matrices (PWMs) or information gainMatrices (IGMs).

## References

SHOWING 1-9 OF 9 REFERENCES

Probability theory: the logic of science

- Physics
- 2005

This is a remarkable book by a remarkable scientist. E. T. Jaynes was a physicist, principally theoretical, who found himself driven to spend much of his life advocating, defending and developing a…

Correlation and Independence in the Neural Code

- Computer ScienceNeural Computation
- 2006

The Nirenberg-Latham loss is elucidated from the point of view of information geometry and how much information is lost by using this unfaithful model for decoding is investigated.

Synergy, Redundancy, and Independence in Population Codes, Revisited

- Computer ScienceThe Journal of Neuroscience
- 2005

It is shown that synergy and ΔIshuffled are confounded measures: they can be zero when correlations are clearly important for decoding and positive when they are not; in contrast, ΔI is not confounded, and has an information theoretic interpretation.

Weak pairwise correlations imply strongly correlated network states in a neural population

- BiologyNature
- 2006

It is shown, in the vertebrate retina, that weak correlations between pairs of neurons coexist with strongly collective behaviour in the responses of ten or more neurons, and it is found that this collective behaviour is described quantitatively by models that capture the observed pairwise correlations but assume no higher-order interactions.

Elements of Information Theory

- Computer Science
- 1991

The author examines the role of entropy, inequality, and randomness in the design of codes and the construction of codes in the rapidly changing environment.

Network information and connected correlations.

- Computer Science, MathematicsPhysical review letters
- 2003

The information theoretic analog of connected correlation functions is constructed: irreducible N-point correlation is measured by a decrease in entropy for the joint distribution of N variables relative to the maximum entropy allowed by all the observed N-1 variable distributions.

The Structure of Multi-Neuron Firing Patterns in Primate Retina

- BiologyThe Journal of Neuroscience
- 2006

Large-scale multi-electrode recordings were used to measure electrical activity in nearly complete, regularly spaced mosaics of several hundred ON and OFF parasol retinal ganglion cells in macaque monkey retina, and pairwise and adjacent interactions accurately accounted for the structure and prevalence of multi-neuron firing patterns.

A Mathematical Theory of Communication

- Computer Science
- 2006

It is proved that the authors can get some positive data rate that has the same small error probability and also there is an upper bound of the data rate, which means they cannot achieve the data rates with any encoding scheme that has small enough error probability over the upper bound.

Pattern Classification

- Mathematics, Environmental ScienceSpringer London
- 2001

Classification • Supervised – parallelpiped – minimum distance – maximum likelihood (Bayes Rule) > non-parametric > parametric – support vector machines – neural networks – context classification •…