# Characterizing Well-behaved vs. Pathological Deep Neural Network Architectures

@article{Labatie2018CharacterizingWV, title={Characterizing Well-behaved vs. Pathological Deep Neural Network Architectures}, author={Antoine Labatie}, journal={ArXiv}, year={2018}, volume={abs/1811.03087} }

We introduce a principled approach, requiring only mild assumptions, for the characterization of deep neural networks at initialization. Our approach applies both to fully-connected and convolutional networks and incorporates the commonly used techniques of batch normalization and skip-connections. Our key insight is to consider the evolution with depth of statistical moments of signal and sensitivity, thereby characterizing the well-behaved or pathological behaviour of input-output mappings…

## References

SHOWING 1-10 OF 45 REFERENCES

### Characterizing Well-Behaved vs. Pathological Deep Neural Networks

- Computer ScienceICML
- 2019

A novel approach is introduced, requiring only mild assumptions, for the characterization of deep neural networks at initialization, to consider the evolution with depth of statistical moments of signal and noise, thereby characterizing the presence or absence of pathologies in the hypothesis space encoded by the choice of hyperparameters.

### Avoiding pathologies in very deep networks

- Computer ScienceAISTATS
- 2014

It is shown that in standard architectures, the representational capacity of the network tends to capture fewer degrees of freedom as the number of layers increases, retaining only a single degree of freedom in the limit.

### Dynamical Isometry and a Mean Field Theory of CNNs: How to Train 10, 000-Layer Vanilla Convolutional Neural Networks

- Computer ScienceICML
- 2018

This work demonstrates that it is possible to train vanilla CNNs with ten thousand layers or more simply by using an appropriate initialization scheme, and presents an algorithm for generating such random initial orthogonal convolution kernels.

### On the Expressive Power of Deep Neural Networks

- Computer ScienceICML
- 2017

We propose a new approach to the problem of neural network expressivity, which seeks to characterize how structural properties of a neural network family affect the functions it is able to compute.…

### Identity Mappings in Deep Residual Networks

- Computer ScienceECCV
- 2016

The propagation formulations behind the residual building blocks suggest that the forward and backward signals can be directly propagated from one block to any other block, when using identity mappings as the skip connections and after-addition activation.

### Collapse of Deep and Narrow Neural Nets

- Computer ScienceArXiv
- 2018

This work demonstrates this collapse of deep and narrow NNs both numerically and theoretically, and provides estimates of the probability of collapse, and constructs a diagram of a safe region for designing NNs that avoid the collapse to erroneous states.

### Deep Convolutional Networks as shallow Gaussian Processes

- Computer ScienceICLR
- 2019

We show that the output of a (residual) convolutional neural network (CNN) with an appropriate prior over the weights and biases is a Gaussian process (GP) in the limit of infinitely many…

### Gradients explode - Deep Networks are shallow - ResNet explained

- Computer ScienceICLR
- 2018

The *residual trick* is devised, which reveals that introducing skip connections simplifies the network mathematically, and that this simplicity may be the major cause for their success.

### A Closer Look at Memorization in Deep Networks

- Computer ScienceICML
- 2017

The analysis suggests that the notions of effective capacity which are dataset independent are unlikely to explain the generalization performance of deep networks when trained with gradient based methods because training data itself plays an important role in determining the degree of memorization.

### Resurrecting the sigmoid in deep learning through dynamical isometry: theory and practice

- Computer ScienceNIPS
- 2017

This work uses powerful tools from free probability theory to compute analytically the entire singular value distribution of a deep network's input-output Jacobian, and reveals that controlling the entire distribution of Jacobian singular values is an important design consideration in deep learning.