# The IM algorithm: a variational approach to Information Maximization

@inproceedings{Barber2003TheIA, title={The IM algorithm: a variational approach to Information Maximization}, author={David Barber and Felix V. Agakov}, booktitle={NIPS 2003}, year={2003} }

The maximisation of information transmission over noisy channels is a common, albeit generally computationally difficult problem. [] Key Method The resulting IM algorithm is analagous to the EM algorithm, yet maximises mutual information, as opposed to likelihood. We apply the method to several practical examples, including linear compression, population encoding and CDMA.

## 313 Citations

### Variational Information Maximization in Gaussian Channels

- Computer Science
- 2004

This paper shows how the mutual information bound, when applied to this arena, gives PCA solutions, without the need for the Gaussian assumption, and naturally generalizes to providing an objective function for Kernel PCA, enabling the principled selection of kernel parameters.

### Variational Information Maximization and ( K ) PCA

- Computer Science
- 2004

This paper shows how the mutual information bound, when applied to this arena, gives PCA solutions, without the need for the Gaussian assumption, and naturally generalizes to providing an objective function for Kernel PCA, enabling the principled selection of kernel parameters.

### Variational Information Maximization in Stochastic Environments

- Computer Science
- 2006

A rigorous and general framework for maximizing the mutual information in intrinsically intractable channels, which gives rise to simple, stable, and easily generalizable optimization procedures, which outperform and supersede many of the common approximate information-maximizing techniques.

### Relevant sparse codes with variational information bottleneck

- Computer ScienceNIPS
- 2016

This work proposes an approximate variational scheme for maximizing a lower bound on the IB objective, analogous to variational EM, and derives an IB algorithm to recover features that are both relevant and sparse.

### Auxiliary Variational Information Maximization for Dimensionality Reduction

- Computer ScienceSLSFS
- 2005

This work introduces a richer family of auxiliary variational bounds on MI, which generalizes previous approximations and shows that the auxiliary variable method may help to significantly improve on reconstructions from noisy lower-dimensional projections.

### Deep Learning for Channel Coding via Neural Mutual Information Estimation

- Computer Science2019 IEEE 20th International Workshop on Signal Processing Advances in Wireless Communications (SPAWC)
- 2019

This work uses a recently proposed neural estimator of mutual information to optimize the encoder for a maximized mutual information, only relying on channel samples, and shows that this approach achieves the same performance as state-of-the-art end-to-end learning with perfect channel model knowledge.

### Variational Information Maximisation for Intrinsically Motivated Reinforcement Learning

- Computer ScienceNIPS
- 2015

This paper develops a stochastic optimisation algorithm that allows for scalable information maximisation and empowerment-based reasoning directly from pixels to actions on the problem of intrinsically-motivated learning.

### Uncertainty Autoencoders: Learning Compressed Representations via Variational Information Maximization

- Computer ScienceAISTATS
- 2019

This work proposes Uncertainty Autoencoders, a learning framework for unsupervised representation learning inspired by compressed sensing that provides a unified treatment to several lines of research in dimensionality reduction, compressed sensing, and generative modeling.

### DEMI: Discriminative Estimator of Mutual Information

- Computer ScienceArXiv
- 2020

It is shown theoretically that the method and other variational approaches are equivalent when they achieve their optimum, while the approach does not optimize a variational bound.

### Variational Information Maximization for Feature Selection

- Computer ScienceNIPS
- 2016

This work forms a more flexible and general class of assumptions based on variational distributions and uses them to tractably generate lower bounds for mutual information in a novel information-theoretic framework for feature selection.

## References

SHOWING 1-10 OF 18 REFERENCES

### An Information-Maximization Approach to Blind Separation and Blind Deconvolution

- Computer ScienceNeural Computation
- 1995

It is suggested that information maximization provides a unifying framework for problems in "blind" signal processing and dependencies of information transfer on time delays are derived.

### Mutual Information, Fisher Information, and Population Coding

- Computer ScienceNeural Computation
- 1998

It is shown that in the context of population coding, the mutual information between the activity of a large array of neurons and a stimulus to which the neurons are tuned is naturally related to the Fisher information.

### An information-theoretic unsupervised learning algorithm for neural networks

- Computer Science
- 1993

This thesis proposes a class of information-theoretic learning algorithms which cause a network to become tuned to spatially coherent features of visual images, and shows that this method works well for learning depth from random dot stereograms of curved surfaces.

### Deriving Receptive Fields Using an Optimal Encoding Criterion

- Computer ScienceNIPS
- 1992

It is shown how infomax, when applied to a class of nonlinear input-output mappings, can under certain conditions generate optimal filters that have additional useful properties.

### Analysis of Bit Error Probability of Direct-Sequence CDMA Multiuser Demodulators

- BusinessNIPS
- 2000

Results of the performance evaluation shows effectiveness of the optimal demodulator and the mean-field demodulators compared with the conventional one, especially in the cases of small information bit rate and low noise level.

### A new class of upper bounds on the log partition function

- Computer Science, MathematicsIEEE Transactions on Information Theory
- 2005

A new class of upper bounds on the log partition function of a Markov random field (MRF) is introduced, based on concepts from convex duality and information geometry, and the Legendre mapping between exponential and mean parameters is exploited.

### Advanced mean field methods: theory and practice

- Computer Science
- 2001

The theoretical foundations of advanced mean field methods are covered, the relation between the different approaches are explored, the quality of the approximation obtained is examined, and their application to various areas of probabilistic modeling is demonstrated.

### Tractable Approximate Belief Propagation

- Geology
- 2001

This chapter contains sections titled: Graphical Models, Undirected Belief Propagation, Directed Belief Propagation, Tractable Implementations of Directed Belief Propagation, Undirected vs Directed…

### Mutual Information in Learning Feature Transformations

- Computer ScienceICML
- 2000

The work of Principe et al. is extended to mutual information between continuous multidimensional variables and discrete-valued class labels, and Renyi’s quadratic entropy is used.