• Corpus ID: 237057359

Doubly infinite residual neural networks: a diffusion process approach

@article{Peluchetti2021DoublyIR,
  title={Doubly infinite residual neural networks: a diffusion process approach},
  author={Stefano Peluchetti and Stefano Favaro and Philipp Hennig},
  journal={J. Mach. Learn. Res.},
  year={2021},
  volume={22},
  pages={175:1-175:48}
}
Modern neural networks featuring a large number of layers (depth) and units per layer (width) have achieved a remarkable performance across many domains. While there exists a vast literature on the interplay between infinitely wide neural networks and Gaussian processes, a little is known about analogous interplays with respect to infinitely deep neural networks. Neural networks with independent and identically distributed (i.i.d.) initializations exhibit undesirable forward and backward… 

Figures and Tables from this paper

References

SHOWING 1-10 OF 45 REFERENCES

Infinitely deep neural networks as diffusion processes

This work considers parameter distributions that shrink as the number of layers increases in order to recover well-behaved stochastic processes in the limit of infinite depth to set forth a link between infinitely deep residual networks and solutions to stochastically differential equations.

Deep Neural Networks as Gaussian Processes

The exact equivalence between infinitely wide deep networks and GPs is derived and it is found that test performance increases as finite-width trained networks are made wider and more similar to a GP, and thus that GP predictions typically outperform those of finite- width networks.

Gaussian Process Behaviour in Wide Deep Neural Networks

It is shown that, under broad conditions, as the authors make the architecture increasingly wide, the implied random function converges in distribution to a Gaussian process, formalising and extending existing results by Neal (1996) to deep networks.

Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent

This work shows that for wide NNs the learning dynamics simplify considerably and that, in the infinite width limit, they are governed by a linear model obtained from the first-order Taylor expansion of the network around its initial parameters.

Mean Field Residual Networks: On the Edge of Chaos

It is shown, theoretically as well as empirically, that common initializations such as the Xavier or the He schemes are not optimal for residual networks, because the optimal initialization variances depend on the depth.

On Exact Computation with an Infinitely Wide Neural Net

The current paper gives the first efficient exact algorithm for computing the extension of NTK to convolutional neural nets, which it is called Convolutional NTK (CNTK), as well as an efficient GPU implementation of this algorithm.

Deep Convolutional Networks as shallow Gaussian Processes

We show that the output of a (residual) convolutional neural network (CNN) with an appropriate prior over the weights and biases is a Gaussian process (GP) in the limit of infinitely many

Deep Information Propagation

The presence of dropout destroys the order-to-chaos critical point and therefore strongly limits the maximum trainable depth for random networks, and a mean field theory for backpropagation is developed that shows that the ordered and chaotic phases correspond to regions of vanishing and exploding gradient respectively.

Exponential expressivity in deep neural networks through transient chaos

The theoretical analysis of the expressive power of deep networks broadly applies to arbitrary nonlinearities, and provides a quantitative underpinning for previously abstract notions about the geometry of deep functions.

Neural Ordinary Differential Equations

This work shows how to scalably backpropagate through any ODE solver, without access to its internal operations, which allows end-to-end training of ODEs within larger models.