On the Difference between the Information Bottleneck and the Deep Information Bottleneck

  title={On the Difference between the Information Bottleneck and the Deep Information Bottleneck},
  author={Aleksander Wieczorek and Volker Roth},
Combining the information bottleneck model with deep learning by replacing mutual information terms with deep neural nets has proven successful in areas ranging from generative modelling to interpreting deep neural networks. In this paper, we revisit the deep variational information bottleneck and the assumptions needed for its derivation. The two assumed properties of the data, X and Y, and their latent representation T, take the form of two Markov chains T−X−Y and X−T−Y. Requiring both to… 
On the Information Bottleneck Problems: Models, Connections, Applications and Information Theoretic Views
This tutorial paper focuses on the variants of the bottleneck problem taking an information theoretic perspective and discusses practical methods to solve it, as well as its connection to coding and
Information Bottleneck Analysis by a Conditional Mutual Information Bound
It is demonstrated that conditional mutual information I(z;x|y) provides an alternative upper bound for I( z;n), and this bound is applicable even if z is not a sufficient representation of x, that is, I(Z;y)≠I(x;y).
A Comparison of Variational Bounds for the Information Bottleneck Functional
This work tries to shed light on the variational bounds proposed in Alemi et al. (2017) and Fischer (2020) for the information bottleneck (IB) and the conditional entropy bottleneck (CEB) functional by showing that, in the most general setting, no ordering can be established between these Variational bounds.
Learning Conditional Invariance through Cycle Consistency
This work proposes a novel approach to cycle consistency based on the deep information bottleneck and, in contrast to other approaches, allows using continuous target properties and provides inherent model selection capabilities.
On Learning Prediction-Focused Mixtures
This work introduces prediction-focused modeling for mixtures, which automatically selects the dimensions relevant to the prediction task and identifies relevant signal from the input, outperforms models that are not prediction- focused, and is easy to optimize.
Inverse Learning of Symmetry Transformations
This work proposes learning two latent subspaces, where the first subspace captures the property and the second subspace the remaining invariant information, based on the deep information bottleneck principle in combination with a mutual information regulariser.
Information Bottleneck for Estimating Treatment Effects with Systematically Missing Covariates
This paper trains an information bottleneck to perform a low-dimensional compression of covariates by explicitly considering the relevance of information for treatment effects and can reliably and accurately estimate treatment effects even in the absence of a full set of covariate information at test time.
Prediction-focused Mixture Models
This work introduces the prediction-focused mixture model, which selects and models input features relevant to predicting the targets and demonstrates that this approach identifies relevant signal from inputs even when the model is highly misspecified.
Inverse Learning of Symmetries
This work proposes to learn the symmetry transformation with a model consisting of two latent subspaces, where the first subspace captures the target and the second subspace the remaining invariant information, based on the deep information bottleneck in combination with a continuous mutual information regulariser.


Deep learning and the information bottleneck principle
It is argued that both the optimal architecture, number of layers and features/connections at each layer, are related to the bifurcation points of the information bottleneck tradeoff, namely, relevant compression of the input layer with respect to the output layer.
On the Information Bottleneck Theory of Deep Learning
This paper presents a comprehensive theory of large scale learning with Deep Neural Networks (DNN), when optimized with Stochastic Gradient Decent (SGD), built on three theoretical components.
Information Dropout: Learning Optimal Representations Through Noisy Computation
It is proved that Information Dropout achieves a comparable or better generalization performance than binary dropout, especially on smaller models, since it can automatically adapt the noise to the structure of the network, as well as to the test sample.
An Information-Theoretic Analysis of Deep Latent-Variable Models
An information-theoretic framework for understanding trade-offs in unsupervised learning of deep latent-variables models using variational inference and how this framework sheds light on many recent proposed extensions to the variational autoencoder family is presented.
Opening the Black Box of Deep Neural Networks via Information
This work demonstrates the effectiveness of the Information-Plane visualization of DNNs and shows that the training time is dramatically reduced when adding more hidden layers, and the main advantage of the hidden layers is computational.
Emergence of Invariance and Disentanglement in Deep Representations
It is shown that in a deep neural network invariance to nuisance factors is equivalent to information minimality of the learned representation, and that stacking layers and injecting noise during training naturally bias the network towards learning invariant representations.
InfoVAE: Balancing Learning and Inference in Variational Autoencoders
It is shown that the proposed Info-VAE model can significantly improve the quality of the variational posterior and can make effective use of the latent features regardless of the flexibility of the decoding distribution.
Learning Sparse Latent Representations with the Deep Copula Information Bottleneck
This paper adopts the deep information bottleneck model, identifies its shortcomings and proposes a model that circumvents them, and applies a copula transformation which restores the invariance properties of the information bottleneck method and leads to disentanglement of the features in the latent space.
Gaussian Lower Bound for the Information Bottleneck Limit
A Gaussian lower bound to the IB curve is introduced and it is shown that the optimal Gaussian embedding is bounded from above by non-linear CCA, which allows a fundamental limit for the ability to Gaussianize arbitrary data-sets and solve complex problems by linear methods.
Fixing a Broken ELBO
This framework derives variational lower and upper bounds on the mutual information between the input and the latent variable, and uses these bounds to derive a rate-distortion curve that characterizes the tradeoff between compression and reconstruction accuracy.