# Understanding and mitigating exploding inverses in invertible neural networks

@inproceedings{Behrmann2021UnderstandingAM, title={Understanding and mitigating exploding inverses in invertible neural networks}, author={Jens Behrmann and Paul Vicol and Kuan-Chieh Wang and Roger Baker Grosse and J{\"o}rn-Henrik Jacobsen}, booktitle={International Conference on Artificial Intelligence and Statistics}, year={2021} }

Invertible neural networks (INNs) have been used to design generative models, implement memory-saving gradient computation, and solve inverse problems. In this work, we show that commonly-used INN architectures suffer from exploding inverses and are thus prone to becoming numerically non-invertible. Across a wide range of INN use-cases, we reveal failures including the non-applicability of the change-of-variables formula on in- and out-of-distribution (OOD) data, incorrect gradients for memory…

## 47 Citations

### The Effects of Invertibility on the Representational Complexity of Encoders in Variational Autoencoders

- Computer ScienceICLR
- 2022

It is proved that there exist noninvertible generative maps, for which the encoding direction needs to be exponentially larger (under standard assumptions in computational complexity), which provides theoretical support for the empirical wisdom that learning deep generative models is harder when data lies on a low-dimensional manifold.

### Deep Neural Networks are Surprisingly Reversible: A Baseline for Zero-Shot Inversion

- Computer ScienceArXiv
- 2021

A zero-shot direct model inversion framework that recovers the input to the trained model given only the internal representation, and empirically shows that modern classification models on ImageNet can be inverted, allowing an approximate recovery of the original 224× 224px images from a representation after more than 20 layers.

### Diagnosing and Fixing Manifold Overfitting in Deep Generative Models

- Computer ScienceArXiv
- 2022

This paper proposes a class of two-step procedures consist-ing of a dimensionality reduction step followed by maximum-likelihood density estimation, and proves that they recover the data-generating distribution in the nonparametric regime, thus avoiding manifold overﬁtting.

### Stabilizing invertible neural networks using mixture models

- MathematicsArXiv
- 2020

This analysis indicates that changing the latent distribution from a standard normal one to a Gaussian mixture model resolves the issue of exploding Lipschitz constants and leads to significantly improved sampling quality in multimodal applications.

### Can Push-forward Generative Models Fit Multimodal Distributions?

- Computer ScienceArXiv
- 2022

This work shows that the Lipschitz constant of these generative networks has to be large in order to approximate multimodal distributions and empirically shows that generative models consisting of stacked networks with stochastic input at each step, such as diffusion models do not suffer of such limitations.

### SIReN-VAE: Leveraging Flows and Amortized Inference for Bayesian Networks

- Computer ScienceArXiv
- 2022

This work explores incorporating arbitrary dependency structures, as speciﬁed by Bayesian networks, into VAEs by extending both the prior and inference network with graphical residual ﬂows—residual residual blocks that encode conditional independence by masking the weight matrices of the residual blocks.

### Structure-preserving deep learning

- Computer ScienceEuropean Journal of Applied Mathematics
- 2021

A number of directions in deep learning are reviewed: some deep neural networks can be understood as discretisations of dynamical systems, neural Networks can be designed to have desirable properties such as invertibility or group equivariance, and new algorithmic frameworks based on conformal Hamiltonian systems and Riemannian manifolds to solve the optimisation problems have been proposed.

### SIR E N-VAE: L EVERAGING F LOWS AND A MORTIZED I NFERENCE FOR B AYESIAN N ETWORKS

- Computer Science
- 2022

This work explores incorporating arbitrary dependency structures, as specified by Bayesian networks, into VAEs by extending both the prior and inference network with graphical residual flows—residual flows that encode conditional independence by masking the weight matrices of the flow’s residual blocks.

### Training Normalizing Flows with the Information Bottleneck for Competitive Generative Classification

- Computer ScienceNeurIPS
- 2020

This work develops the theory and methodology of IB-INNs, a class of conditional normalizing flows where INNs are trained using the IB objective, and finds the trade-off parameter in the IB controls a mix of generative capabilities and accuracy close to standard classifiers.

### Universal Approximation for Log-concave Distributions using Well-conditioned Normalizing Flows

- MathematicsArXiv
- 2021

It is shown that any log-concave distribution can be approximated using wellconditioned affine-coupling flows, and deep connections between affine coupling architectures, underdamped Langevin dynamics and Hénon maps are uncovered.

## References

SHOWING 1-10 OF 65 REFERENCES

### Reversible Architectures for Arbitrarily Deep Residual Neural Networks

- Computer ScienceAAAI
- 2018

From this interpretation, a theoretical framework on stability and reversibility of deep neural networks is developed, and three reversible neural network architectures that can go arbitrarily deep in theory are derived.

### Analyzing Inverse Problems with Invertible Neural Networks

- Computer ScienceICLR
- 2019

It is argued that a particular class of neural networks is well suited for this task -- so-called Invertible Neural Networks (INNs), and it is verified experimentally that INNs are a powerful analysis tool to find multi-modalities in parameter space, to uncover parameter correlations, and to identify unrecoverable parameters.

### i-RevNet: Deep Invertible Networks

- Computer ScienceICLR
- 2018

The i-RevNet is built, a network that can be fully inverted up to the final projection onto the classes, i.e. no information is discarded, and linear interpolations between natural image representations are reconstructed.

### MintNet: Building Invertible Neural Networks with Masked Convolutions

- Computer ScienceNeurIPS
- 2019

We propose a new way of constructing invertible neural networks by combining simple building blocks with a novel set of composition rules. This leads to a rich set of invertible architectures,…

### Analysis of Invariance and Robustness via Invertibility of ReLU-Networks

- Computer ScienceArXiv
- 2018

A theoretically motivated approach is derived to explore the preimages of ReLU-layers and mechanisms affecting the stability of the inverse of DNNs and how this approach uncovers characteristic properties of the network.

### Residual Flows for Invertible Generative Modeling

- MathematicsNeurIPS
- 2019

The resulting approach, called Residual Flows, achieves state-of-the-art performance on density estimation amongst flow-based models, and outperforms networks that use coupling blocks at joint generative and discriminative modeling.

### iUNets: Fully invertible U-Nets with Learnable Up- and Downsampling

- Computer ScienceArXiv
- 2020

A new fully-invertible U-Net-based architecture called the iUNet is presented, which employs novel learnable and invertible up- and downsampling operations, thereby making the use of memory-efficient backpropagation possible.

### Invertible Convolutional Flow

- Computer ScienceNeurIPS
- 2019

This work investigates a set of novel normalizing flows based on the circular and symmetric convolutions and proposes an analytic approach to designing nonlinear elementwise bijectors that induce special properties in the intermediate layers, by implicitly introducing specific regularizers in the loss.

### Invertible Residual Networks

- Computer ScienceICML
- 2019

The empirical evaluation shows that invertible ResNets perform competitively with both state-of-the-art image classifiers and flow-based generative models, something that has not been previously achieved with a single architecture.

### Reversible Recurrent Neural Networks

- Computer ScienceNeurIPS
- 2018

This work shows that perfectly reversible RNNs, which require no storage of the hidden activations, are fundamentally limited, and provides a scheme for storing a small number of bits in order to allow perfect reversal with forgetting.