# Self-Consistent Dynamical Field Theory of Kernel Evolution in Wide Neural Networks

@article{Bordelon2022SelfConsistentDF, title={Self-Consistent Dynamical Field Theory of Kernel Evolution in Wide Neural Networks}, author={Blake Bordelon and Cengiz Pehlevan}, journal={ArXiv}, year={2022}, volume={abs/2205.09653} }

We analyze feature learning in inﬁnite width neural networks trained with gradient ﬂow through a self-consistent dynamical ﬁeld theory. We construct a collection of deterministic dynamical order parameters which are inner-product kernels for hidden unit activations and gradients in each layer at pairs of time points, providing a reduced description of network activity through training. These kernel order parameters collectively deﬁne the hidden layer activation distribution, the evolution of…

## 7 Citations

### Meta-Principled Family of Hyperparameter Scaling Strategies

- Computer ScienceArXiv
- 2022

A one-parameter family of hyperparameter scaling strategies that interpolates between the neural-tangent scaling and mean-ﬁeld/maximal-update scaling is derived, revealing a proper way to scale depth with width such that resultant large-scale models maintain their representation-learning ability.

### The Influence of Learning Rule on Representation Dynamics in Wide Neural Networks

- Computer ScienceArXiv
- 2022

It is shown that initial correlation ρ between forward and backward pass weights alters the inductive bias of FA in both lazy and rich regimes, and is a step towards understanding learned representations in neural networks.

### A theory of representation learning in deep neural networks gives a deep generalisation of kernel methods

- Computer Science
- 2021

A new in-nite width limit, the representation learning limit, is developed that exhibits representation learning mirroring that in ﬁnite-width networks, yet at the same time, remains extremely tractable.

### A Kernel Analysis of Feature Learning in Deep Neural Networks

- Computer Science, Biology2022 58th Annual Allerton Conference on Communication, Control, and Computing (Allerton)
- 2022

This work empirically study the kernels induced by the layer representations during training by analyzing their kernel alignment to the network's target function and shows that representations from earlier to deeper layers increasingly align with the target task for both training and test sets, implying better generalization.

### Second-order regression models exhibit progressive sharpening to the edge of stability

- Computer Science, MathematicsArXiv
- 2022

This work proves that for quadratic objectives in two dimensions, this second-order regression model exhibits progressive sharpening of the NTK eigenvalue towards a value that differs slightly from the edge of stability, which it explicitly compute.

### Dynamical Mean Field Theory of Kernel Evolution in Wide Neural Networks

- Computer Science
- 2022

A collection of deterministic dynamical order parameters which are inner-product kernels for hidden unit activations and gradients in each layer at pairs of time points are constructed, providing a reduced description of network activity through training.

### Decomposing neural networks as mappings of correlation functions

- Computer SciencePhysical Review Research
- 2022

The mapping between probability distributions implemented by a deep feed-forward network is studied as an iterated transformation of distributions, where the non-linearity in each layer transfers information between diﬀerent orders of correlation functions to identify essential statistics in the data.

## References

SHOWING 1-10 OF 97 REFERENCES

### Mean-field theory of two-layers neural networks: dimension-free bounds and kernel limit

- Computer Science, MathematicsCOLT
- 2019

This paper shows that the number of hidden units only needs to be larger than a quantity dependent on the regularity properties of the data, and independent of the dimensions, and generalizes this analysis to the case of unbounded activation functions.

### A self consistent theory of Gaussian Processes captures feature learning effects in finite CNNs

- Computer ScienceNeurIPS
- 2021

This work considers DNNs trained with noisy gradient descent on a large training set and derives a self-consistent Gaussian Process theory accounting for strong ﬁnite-DNN and feature learning effects and identifies a sharp transition between a feature learning regime and a lazy learning regime in this model.

### A Theory of Neural Tangent Kernel Alignment and Its Influence on Training

- Computer Science
- 2021

This work seeks to theoretically understand kernel alignment, a prominent and ubiquitous structural change that aligns the NTK with the target function, and identifies factors in network architecture and data structure that drive kernel alignment.

### Wide neural networks of any depth evolve as linear models under gradient descent

- Computer ScienceNeurIPS
- 2019

This work shows that for wide NNs the learning dynamics simplify considerably and that, in the infinite width limit, they are governed by a linear model obtained from the first-order Taylor expansion of the network around its initial parameters.

### Statistical Mechanics of Deep Linear Neural Networks: The Backpropagating Kernel Renormalization

- Computer SciencePhysical Review X
- 2021

This work is the first exact statistical mechanical study of learning in a family of Deep Neural Networks, and the first successful theory of learning through the successive integration of Degrees of Freedom in the learned weight space.

### Unified Field Theory for Deep and Recurrent Neural Networks

- Computer Science
- 2021

A unified and systematic derivation of the mean-field theory for both architectures that starts from first principles by employing established methods from statistical physics of disordered systems is presented, exposing that Gaussian processes are but the lowest order of a systematic expansion in 1/n.

### Spectrum Dependent Learning Curves in Kernel Regression and Wide Neural Networks

- Computer ScienceICML
- 2020

A new spectral principle is identified: as the size of the training set grows, kernel machines and neural networks fit successively higher spectral modes of the target function.

### Scaling Limits of Wide Neural Networks with Weight Sharing: Gaussian Process Behavior, Gradient Independence, and Neural Tangent Kernel Derivation

- Computer ScienceArXiv
- 2019

This work opens a way toward design of even stronger Gaussian Processes, initialization schemes to avoid gradient explosion/vanishing, and deeper understanding of SGD dynamics in modern architectures.

### Dynamical mean-field theory for stochastic gradient descent in Gaussian mixture classification

- Computer ScienceNeurIPS
- 2020

This work analyzes in a closed form the learning dynamics of the stochastic gradient descent for a single-layer neural network classifying a high-dimensional Gaussian mixture where each cluster is assigned one of two labels and explores the performance of the algorithm as a function of the control parameters shedding light on how it navigates the loss landscape.

### Mean Field Residual Networks: On the Edge of Chaos

- Computer ScienceNIPS
- 2017

It is shown, theoretically as well as empirically, that common initializations such as the Xavier or the He schemes are not optimal for residual networks, because the optimal initialization variances depend on the depth.