Neural networks and quantum field theory

@article{Halverson2020NeuralNA,
title={Neural networks and quantum field theory},
author={James Halverson and Anindita Maiti and Keegan Stoner},
journal={Machine Learning: Science and Technology},
year={2020},
volume={2}
}
• Published 19 August 2020
• Computer Science
• Machine Learning: Science and Technology
We propose a theoretical understanding of neural networks in terms of Wilsonian effective field theory. The correspondence relies on the fact that many asymptotic neural networks are drawn from Gaussian processes (GPs), the analog of non-interacting field theories. Moving away from the asymptotic limit yields a non-Gaussian process (NGP) and corresponds to turning on particle interactions, allowing for the computation of correlation functions of neural network outputs with Feynman diagrams…
• Yang-Hui He
• Computer Science
International Journal of Modern Physics A
• 2021
The program in machine-learning mathematical structures is discussed and the tantalizing question of how it helps doing mathematics is addressed, ranging from mathematical physics, to geometry, to representation theory, to combinatorics and to number theory.
• Computer Science
Physical Review D
• 2021
It is demonstrated that the $\ensuremath{\phi}}^{4}$ scalar field theory satisfies the Hammersley-Clifford theorem, therefore recasting it as a machine learning algorithm within the mathematically rigorous framework of Markov random fields.
• Computer Science
Mach. Learn. Sci. Technol.
• 2022
The aim is to provide a useful formalism to investigate NNs behavior beyond the large-width limit in a non-perturbative fashion and a major result of this analysis is that changing the standard deviation of the NN weight distribution can be interpreted as a renormalization flow in the space of networks.
• Computer Science
SciPost Physics
• 2022
This work explicitly construct the quantum field theory corresponding to a general class of deep neural networks encompassing both recurrent and feedforward architectures, and provides a first-principles approach to the rapidly emerging NN-QFT correspondence.
• Computer Science
ArXiv
• 2021
It is demonstrated that the amount of symmetry in the initialization density affects the accuracy of networks trained on Fashion-MNIST, and that symmetry breaking helps only when it is in the direction of ground truth.
• Physics, Computer Science
ArXiv
• 2022
This work states that changing the standard deviation of the neural network weight distribution corresponds to a renormalization in the space of networks and discusses preliminary numerical results for translation-invariant kernels.
• Computer Science
• 2022
This work gives a proof of unimodality for linear kernels, and a number of experiments in the nonlinear case in which all deep kernel machines initializations the authors tried converged to the same solution.
• Computer Science
ArXiv
• 2021
This work derives exact solutions for the output priors for individual input examples of a class of finite fully-connected feedforward Bayesian neural networks.

References

SHOWING 1-10 OF 43 REFERENCES

It is demonstrated that for an ensemble of large, finite, fully connected networks with a single hidden layer the distribution of outputs at initialization is well described by a Gaussian perturbed by the fourth Hermite polynomial for weights drawn from a symmetric distribution.
The methodology developed here allows us to track the flow of preactivation distributions by progressively integrating out random variables from lower to higher layers, reminiscent of renormalization-group flow, and develops a perturbative procedure to perform Bayesian inference with weakly non-Gaussian priors.
• Computer Science
ICLR
• 2018
The exact equivalence between infinitely wide deep networks and GPs is derived and it is found that test performance increases as finite-width trained networks are made wider and more similar to a GP, and thus that GP predictions typically outperform those of finite- width networks.
• Computer Science
ArXiv
• 2018
This work derives an analogous equivalence for multi-layer convolutional neural networks both with and without pooling layers, and introduces a Monte Carlo method to estimate the GP corresponding to a given neural network architecture, even in cases where the analytic form has too many terms to be computationally feasible.
This work opens a way toward design of even stronger Gaussian Processes, initialization schemes to avoid gradient explosion/vanishing, and deeper understanding of SGD dynamics in modern architectures.
• Computer Science
ICLR
• 2020
The method is an adaptation of Feynman diagrams, a standard tool for computing multivariate Gaussian integrals, and applies to study training dynamics, improving existing bounds and deriving new results on wide network evolution during stochastic gradient descent.
For neural networks with a wide class of weight-priors, it can be shown that in the limit of an infinite number of hidden units the prior over functions tends to a Gaussian process. In this paper
• Computer Science
ArXiv
• 2020
A DNN training protocol involving noise whose outcome is mappable to a certain non-Gaussian stochastic process and is able to predict the outputs of empirical finite networks with high accuracy, improving upon the accuracy of GP predictions by over an order of magnitude.
• Computer Science
ArXiv
• 2020
It is shown that the standard and NTK parametrizations of a neural network do not admit infinite-width limits that can learn features, which is crucial for pretraining and transfer learning such as with BERT, and any such infinite- width limit can be computed using the Tensor Programs technique.