• Corpus ID: 14553804

Notes on Kullback-Leibler Divergence and Likelihood

  title={Notes on Kullback-Leibler Divergence and Likelihood},
  author={Jonathon Shlens},
The Kullback-Leibler (KL) divergence is a fundamental equation of information theory that quantifies the proximity of two probability distributions. Although difficult to understand by examining the equation, an intuition and understanding of the KL divergence arises from its intimate relationship with likelihood theory. We discuss how KL divergence arises from likelihood theory in an attempt to provide some intuition and reserve a rigorous (but rather simple) derivation for the appendix… 
Large Sample Asymptotic for Nonparametric Mixture Model with Count Data
The Large Sample Asymptotic for count data when the number of sample in Multinomial distribution goes to infinity is presented and the similar result to SVA for scalable clustering is derived.
An Introduction to Variational Inference
This paper introduces the concept of Variational Inference (VI), a popular method in machine learning that uses optimization techniques to estimate complex probability densities and discusses the applications of VI to variational auto-encoders and VAE-Generative Adversarial Network.
Optimization Models of Natural Communication
Two important components of the family, namely the information theoretic principles and the energy function that combines them linearly, are reviewed from the perspective of psycholinguistics, language learning, information theory and synergetic linguistics.
Entropy and geometry of quantum states
We compare the roles of the Bures–Helstrom (BH) and Bogoliubov–Kubo–Mori (BKM) metrics in the subject of quantum information geometry. We note that there are two limits involved in state
The shape of terrestrial abundance distributions
  • J. Alroy
  • Environmental Science
    Science Advances
  • 2015
A new, low-dominance distribution of terrestrial abundance: the double geometric, which assumes both that richness is finite and that species compete unequally for resources in a two-dimensional niche landscape, which implies that niche breadths are variable and that trait distributions are neither arrayed along a single dimension nor randomly associated.
Bayesian Root Cause Analysis by Separable Likelihoods
This paper proposes a framework for simple and friendly RCA within the Bayesian regime under certain restrictions (namely that Hessian at the mode is diagonal, in this work referred to as separability) imposed on the predictive posterior.
Wasserstein Distance Guided Cross-Domain Learning
  • Jie Su
  • Computer Science
  • 2019
This work proposes a new approach to infer the joint distribution of images from different distributions, namely Wasserstein Distance Guided Cross-Domain Learning (WDGCDL), which applies theWasserstein distance to estimate the divergence between the source and target distribution which provides good gradient property and promising generalisation bound.
Piecewise constant nonnegative matrix factorization
This paper proposes a non-negative matrix factorization (NMF) model with piecewise-constant activation coefficients that is enforced using a total variation penalty on the rows of the activation matrix and uses it to solve a video structuring problem that involves both segmentation and clustering tasks.
Non-uniform Quantized Distributed Sensing in Practical Wireless Rayleigh Fading Channel
The effect of the collaboration in the CPAC scheme on performance of the distributed sensing compared with non-cooperative scheme is investigated and the sensitivity of the provided quantization scheme to average error probability of symbols is illustrated.
Interpretable Convolution Methods for Learning Genomic Sequence Motifs
This work proposes a schema to learn sequence motifs directly through weight constraints and transformations such that the individual weights comprising the filter are directly interpretable as either position weight matrices (PWMs) or information gainMatrices (IGMs).


Probability theory: the logic of science
This is a remarkable book by a remarkable scientist. E. T. Jaynes was a physicist, principally theoretical, who found himself driven to spend much of his life advocating, defending and developing a
Correlation and Independence in the Neural Code
The Nirenberg-Latham loss is elucidated from the point of view of information geometry and how much information is lost by using this unfaithful model for decoding is investigated.
Synergy, Redundancy, and Independence in Population Codes, Revisited
It is shown that synergy and ΔIshuffled are confounded measures: they can be zero when correlations are clearly important for decoding and positive when they are not; in contrast, ΔI is not confounded, and has an information theoretic interpretation.
Weak pairwise correlations imply strongly correlated network states in a neural population
It is shown, in the vertebrate retina, that weak correlations between pairs of neurons coexist with strongly collective behaviour in the responses of ten or more neurons, and it is found that this collective behaviour is described quantitatively by models that capture the observed pairwise correlations but assume no higher-order interactions.
Elements of Information Theory
The author examines the role of entropy, inequality, and randomness in the design of codes and the construction of codes in the rapidly changing environment.
Network information and connected correlations.
The information theoretic analog of connected correlation functions is constructed: irreducible N-point correlation is measured by a decrease in entropy for the joint distribution of N variables relative to the maximum entropy allowed by all the observed N-1 variable distributions.
The Structure of Multi-Neuron Firing Patterns in Primate Retina
Large-scale multi-electrode recordings were used to measure electrical activity in nearly complete, regularly spaced mosaics of several hundred ON and OFF parasol retinal ganglion cells in macaque monkey retina, and pairwise and adjacent interactions accurately accounted for the structure and prevalence of multi-neuron firing patterns.
A Mathematical Theory of Communication
It is proved that the authors can get some positive data rate that has the same small error probability and also there is an upper bound of the data rate, which means they cannot achieve the data rates with any encoding scheme that has small enough error probability over the upper bound.
Pattern Classification
Classification • Supervised – parallelpiped – minimum distance – maximum likelihood (Bayes Rule) > non-parametric > parametric – support vector machines – neural networks – context classification •