• Corpus ID: 219708291

Understanding and mitigating exploding inverses in invertible neural networks

  title={Understanding and mitigating exploding inverses in invertible neural networks},
  author={Jens Behrmann and Paul Vicol and Kuan-Chieh Wang and Roger Baker Grosse and J{\"o}rn-Henrik Jacobsen},
  booktitle={International Conference on Artificial Intelligence and Statistics},
Invertible neural networks (INNs) have been used to design generative models, implement memory-saving gradient computation, and solve inverse problems. In this work, we show that commonly-used INN architectures suffer from exploding inverses and are thus prone to becoming numerically non-invertible. Across a wide range of INN use-cases, we reveal failures including the non-applicability of the change-of-variables formula on in- and out-of-distribution (OOD) data, incorrect gradients for memory… 

Certified Invertibility in Neural Networks via Mixed-Integer Programming

This work characterize noninvertibility through the lens of mathematical optimization, in which the global solution quantifies the “safety” of the network predictions: their distance from the non invertibility boundary.

Deep Neural Networks are Surprisingly Reversible: A Baseline for Zero-Shot Inversion

A zero-shot direct model inversion framework that recovers the input to the trained model given only the internal representation, and empirically shows that modern classification models on ImageNet can be inverted, allowing an approximate recovery of the original 224× 224px images from a representation after more than 20 layers.

Diagnosing and Fixing Manifold Overfitting in Deep Generative Models

This paper proposes a class of two-step procedures consist-ing of a dimensionality reduction step followed by maximum-likelihood density estimation, and proves that they recover the data-generating distribution in the nonparametric regime, thus avoiding manifold overfitting.

Stabilizing invertible neural networks using mixture models

This analysis indicates that changing the latent distribution from a standard normal one to a Gaussian mixture model resolves the issue of exploding Lipschitz constants and leads to significantly improved sampling quality in multimodal applications.

Can Push-forward Generative Models Fit Multimodal Distributions?

This work shows that the Lipschitz constant of these generative networks has to be large in order to approximate multimodal distributions and empirically shows that generative models consisting of stacked networks with stochastic input at each step, such as diffusion models do not suffer of such limitations.

Structure-preserving deep learning

A number of directions in deep learning are reviewed: some deep neural networks can be understood as discretisations of dynamical systems, neural Networks can be designed to have desirable properties such as invertibility or group equivariance, and new algorithmic frameworks based on conformal Hamiltonian systems and Riemannian manifolds to solve the optimisation problems have been proposed.


This work explores incorporating arbitrary dependency structures, as specified by Bayesian networks, into VAEs by extending both the prior and inference network with graphical residual flows—residual flows that encode conditional independence by masking the weight matrices of the flow’s residual blocks.

Training Normalizing Flows with the Information Bottleneck for Competitive Generative Classification

This work develops the theory and methodology of IB-INNs, a class of conditional normalizing flows where INNs are trained using the IB objective, and finds the trade-off parameter in the IB controls a mix of generative capabilities and accuracy close to standard classifiers.

Universal Approximation for Log-concave Distributions using Well-conditioned Normalizing Flows

It is shown that any log-concave distribution can be approximated using wellconditioned affine-coupling flows, and deep connections between affine coupling architectures, underdamped Langevin dynamics and Hénon maps are uncovered.

Convolutional Proximal Neural Networks and Plug-and-Play Algorithms




Reversible Architectures for Arbitrarily Deep Residual Neural Networks

From this interpretation, a theoretical framework on stability and reversibility of deep neural networks is developed, and three reversible neural network architectures that can go arbitrarily deep in theory are derived.

Analyzing Inverse Problems with Invertible Neural Networks

It is argued that a particular class of neural networks is well suited for this task -- so-called Invertible Neural Networks (INNs), and it is verified experimentally that INNs are a powerful analysis tool to find multi-modalities in parameter space, to uncover parameter correlations, and to identify unrecoverable parameters.

i-RevNet: Deep Invertible Networks

The i-RevNet is built, a network that can be fully inverted up to the final projection onto the classes, i.e. no information is discarded, and linear interpolations between natural image representations are reconstructed.

MintNet: Building Invertible Neural Networks with Masked Convolutions

We propose a new way of constructing invertible neural networks by combining simple building blocks with a novel set of composition rules. This leads to a rich set of invertible architectures,

Analysis of Invariance and Robustness via Invertibility of ReLU-Networks

A theoretically motivated approach is derived to explore the preimages of ReLU-layers and mechanisms affecting the stability of the inverse of DNNs and how this approach uncovers characteristic properties of the network.

Residual Flows for Invertible Generative Modeling

The resulting approach, called Residual Flows, achieves state-of-the-art performance on density estimation amongst flow-based models, and outperforms networks that use coupling blocks at joint generative and discriminative modeling.

iUNets: Fully invertible U-Nets with Learnable Up- and Downsampling

A new fully-invertible U-Net-based architecture called the iUNet is presented, which employs novel learnable and invertible up- and downsampling operations, thereby making the use of memory-efficient backpropagation possible.

Invertible Convolutional Flow

This work investigates a set of novel normalizing flows based on the circular and symmetric convolutions and proposes an analytic approach to designing nonlinear elementwise bijectors that induce special properties in the intermediate layers, by implicitly introducing specific regularizers in the loss.

Invertible Residual Networks

The empirical evaluation shows that invertible ResNets perform competitively with both state-of-the-art image classifiers and flow-based generative models, something that has not been previously achieved with a single architecture.

Reversible Recurrent Neural Networks

This work shows that perfectly reversible RNNs, which require no storage of the hidden activations, are fundamentally limited, and provides a scheme for storing a small number of bits in order to allow perfect reversal with forgetting.