• Corpus ID: 15916415

# Optimally approximating exponential families

@article{Rauh2013OptimallyAE,
title={Optimally approximating exponential families},
author={Johannes Rauh},
journal={Kybernetika},
year={2013},
volume={49},
pages={199-215}
}
• J. Rauh
• Published 28 October 2011
• Mathematics, Computer Science
• Kybernetika
This article studies exponential families $\mathcal{E}$ on finite sets such that the information divergence $D(P\|\mathcal{E})$ of an arbitrary probability distribution from $\mathcal{E}$ is bounded by some constant $D>0$. A particular class of low-dimensional exponential families that have low values of $D$ can be obtained from partitions of the state space. The main results concern optimality properties of these partition exponential families. Exponential families where $D=\log(2)$ are…
4 Citations
Scaling of model approximation errors and expected entropy distances
• Computer Science, Mathematics
Kybernetika
• 2014
We compute the expected value of the Kullback-Leibler divergence to various fundamental statistical models with respect to canonical priors on the probability simplex. We obtain closed formulas for
Universal Approximation Depth and Errors of Narrow Belief Networks with Discrete Units
This analysis covers discrete restricted Boltzmann machines and naive Bayes models as special cases and shows that a q-ary deep belief network with layers of width for some can approximate any probability distribution on without exceeding a Kullback-Leibler divergence.
Maximal Information Divergence from Statistical Models Defined by Neural Networks
• Computer Science
GSI
• 2013
We review recent results about the maximal values of the Kullback-Leibler information divergence from statistical models defined by neural networks, including naive Bayes models, restricted Boltzmann
Restricted Boltzmann Machines: Introduction and Review
An introduction to the mathematical analysis of restricted Boltzmann machines is given, recent results on the geometry of the sets of probability distributions representable by these models are reviewed, and a few directions for further investigation are suggested.

## References

SHOWING 1-10 OF 19 REFERENCES
Finding the Maximizers of the Information Divergence From an Exponential Family
• J. Rauh
• Computer Science
IEEE Transactions on Information Theory
• 2011
It is shown that the rI -projection of a maximizer P to ε is a convex combination of P and a probability measure P- with disjoint support and the same value of the sufficient statistics <i>A</i>.
On maximization of the information divergence from an exponential family
• Mathematics, Computer Science
• 2003
The information divergence of a probability measure P from an exponential family E over a nite set is deened as innmum of the divergences of P from Q subject to Q in E. For convex exponential
Maximization of the information divergence from an exponential family and criticality
• Mathematics
2011 IEEE International Symposium on Information Theory Proceedings
• 2011
The problem to maximize the information divergence from an exponential family is compared to the maximization of an entropy-like quantity over the boundary of a polytope. First-order conditions on
Information Theory and Statistics: A Tutorial
• Computer Science, Mathematics
Found. Trends Commun. Inf. Theory
• 2004
This tutorial is concerned with applications of information theory concepts in statistics, in the finite alphabet setting, and an introduction is provided to the theory of universal coding, and to statistical inference via the minimum description length principle motivated by that theory.
Inducing Features of Random Fields
• Computer Science
IEEE Trans. Pattern Anal. Mach. Intell.
• 1997
The random field models and techniques introduced in this paper differ from those common to much of the computer vision literature in that the underlying random fields are non-Markovian and have a large number of parameters that must be estimated.
On the toric algebra of graphical models
• Mathematics
• 2006
We formulate necessary and sufficient conditions for an arbitrary discrete probability distribution to factor according to an undirected graphical model, or a log-linear model, or other more general
AN INFORMATION-GEOMETRIC APPROACH TO A THEORY OF PRAGMATIC STRUCTURING
• N. Ay
• Computer Science
• 2002
T theoretical results about the low complexity of optimal solutions for the optimization problem of frequently used measures like the mutual information in an unconstrained and more theoretical setting are established.
Matroid theory
The current status has been given for all the unsolved problems or conjectures that appear in Chapter 14 and the corrected text is given with the inserted words underlined.
Minimax Entropy Principle and Its Application to Texture Modeling
• Computer Science
Neural Computation
• 1997
The minimax entropy principle is applied to texture modeling, where a novel Markov random field model, called FRAME, is derived, and encouraging results are obtained in experiments on a variety of texture images.