# The Principles of Deep Learning Theory

@article{Roberts2021ThePO, title={The Principles of Deep Learning Theory}, author={Daniel A. Roberts and Sho Yaida and Boris Hanin}, journal={ArXiv}, year={2021}, volume={abs/2106.10165} }

This book develops an effective theory approach to understanding deep neural networks of practical relevance. Beginning from a first-principles component-level picture of networks, we explain how to determine an accurate description of the output of trained networks by solving layer-to-layer iteration equations and nonlinear learning dynamics. A main result is that the predictions of networks are described by nearly-Gaussian distributions, with the depth-to-width aspect ratio of the network…

## 18 Citations

Nonperturbative renormalization for the neural network-QFT correspondence

- Computer Science, PhysicsArXiv
- 2021

The aim is to provide a useful formalism to investigate neural networks behavior beyond the large-width limit in a nonperturbative fashion and a major result of this analysis is that changing the standard deviation of the neural network weight distribution can be interpreted as a renormalization flow in the space of networks.

Random Neural Networks in the Infinite Width Limit as Gaussian Processes

- Computer Science, MathematicsArXiv
- 2021

This article gives a new proof that fully connected neural networks with random weights and biases converge to Gaussian processes in the regime where the input dimension, output dimension, and depth…

Critical initialization of wide and deep neural networks through partial Jacobians: general theory and applications to LayerNorm

- Computer Science, PhysicsArXiv
- 2021

It is argued that LayerNorm is more stable when applied to preactivations, rather than activations due to larger correlation depth, and the normalization layer changes the optimal values of hyperparameters and critical exponents.

The edge of chaos: quantum field theory and deep neural networks

- Computer Science, PhysicsArXiv
- 2021

This work explicitly construct the quantum field theory corresponding to a general class of deep neural networks encompassing both recurrent and feedforward architectures, and provides a first-principles approach to the rapidly emerging NN-QFT correspondence.

A self consistent theory of Gaussian Processes captures feature learning effects in finite CNNs

- Computer Science, PhysicsArXiv
- 2021

This work considers DNNs trained with noisy gradient descent on a large training set and derives a self-consistent Gaussian Process theory accounting for strong finite-DNN and feature learning effects, and identifies a sharp transition between a feature learning regime and a lazy learning regime in this model.

Appearance of random matrix theory in deep learning

- Computer Science, PhysicsPhysica A: Statistical Mechanics and its Applications
- 2021

We investigate the local spectral statistics of the loss surface Hessians of artificial neural networks, where we discover agreement with Gaussian Orthogonal Ensemble statistics across several…

Asymptotics of representation learning in finite Bayesian neural networks

- Computer Science, PhysicsArXiv
- 2021

It is argued that the leading finitewidth corrections to the average feature kernels for any Bayesian network with linear readout and Gaussian likelihood have a largely universal form.

Depth induces scale-averaging in overparameterized linear Bayesian neural networks

- Computer Science, MathematicsArXiv
- 2021

Finite deep linear Bayesian neural networks are interpreted as datadependent scale mixtures of Gaussian process predictors across output channels, allowing us to connect limiting results obtained in previous studies within a unified framework.

Differentiable Physics: A Position Piece

- Computer Science, PhysicsArXiv
- 2021

It is argued that differentiable physics offers a new paradigm for modeling physical phenomena by combining classical analytic solutions with numerical methodology using the bridge of differentiable programming.

Infinite wide (finite depth) Neural Networks benefit from multi-task learning unlike shallow Gaussian Processes - an exact quantitative macroscopic characterization

- Computer Science, MathematicsArXiv
- 2021

We prove in this paper that optimizing wide ReLU neural networks (NNs) with at least one hidden layer using `2-regularization on the parameters enforces multi-task learning due to…