Neural networks and quantum field theory

  title={Neural networks and quantum field theory},
  author={James Halverson and Anindita Maiti and Keegan Stoner},
  journal={Machine Learning: Science and Technology},
We propose a theoretical understanding of neural networks in terms of Wilsonian effective field theory. The correspondence relies on the fact that many asymptotic neural networks are drawn from Gaussian processes (GPs), the analog of non-interacting field theories. Moving away from the asymptotic limit yields a non-Gaussian process (NGP) and corresponds to turning on particle interactions, allowing for the computation of correlation functions of neural network outputs with Feynman diagrams… 

Universes as big data

  • Yang-Hui He
  • Computer Science
    International Journal of Modern Physics A
  • 2021
The program in machine-learning mathematical structures is discussed and the tantalizing question of how it helps doing mathematics is addressed, ranging from mathematical physics, to geometry, to representation theory, to combinatorics and to number theory.

Quantum field-theoretic machine learning

It is demonstrated that the $\ensuremath{\phi}}^{4}$ scalar field theory satisfies the Hammersley-Clifford theorem, therefore recasting it as a machine learning algorithm within the mathematically rigorous framework of Markov random fields.

Non-perturbative renormalization for the neural network-QFT correspondence

The aim is to provide a useful formalism to investigate NNs behavior beyond the large-width limit in a non-perturbative fashion and a major result of this analysis is that changing the standard deviation of the NN weight distribution can be interpreted as a renormalization flow in the space of networks.

The edge of chaos: quantum field theory and deep neural networks

This work explicitly construct the quantum field theory corresponding to a general class of deep neural networks encompassing both recurrent and feedforward architectures, and provides a first-principles approach to the rapidly emerging NN-QFT correspondence.

Symmetry-via-Duality: Invariant Neural Network Densities from Parameter-Space Correlators

It is demonstrated that the amount of symmetry in the initialization density affects the accuracy of networks trained on Fashion-MNIST, and that symmetry breaking helps only when it is in the direction of ground truth.

Renormalization in the neural network-quantum field theory correspondence

This work states that changing the standard deviation of the neural network weight distribution corresponds to a renormalization in the space of networks and discusses preliminary numerical results for translation-invariant kernels.

Deep kernel machines: exact inference with representation learning in infinite Bayesian neural networks

This work gives a proof of unimodality for linear kernels, and a number of experiments in the nonlinear case in which all deep kernel machines initializations the authors tried converged to the same solution.

Exact priors of finite neural networks

This work derives exact solutions for the output priors for individual input examples of a class of finite fully-connected feedforward Bayesian neural networks.



Finite size corrections for neural network Gaussian processes

It is demonstrated that for an ensemble of large, finite, fully connected networks with a single hidden layer the distribution of outputs at initialization is well described by a Gaussian perturbed by the fourth Hermite polynomial for weights drawn from a symmetric distribution.

Non-Gaussian processes and neural networks at finite widths

The methodology developed here allows us to track the flow of preactivation distributions by progressively integrating out random variables from lower to higher layers, reminiscent of renormalization-group flow, and develops a perturbative procedure to perform Bayesian inference with weakly non-Gaussian priors.

Deep Neural Networks as Gaussian Processes

The exact equivalence between infinitely wide deep networks and GPs is derived and it is found that test performance increases as finite-width trained networks are made wider and more similar to a GP, and thus that GP predictions typically outperform those of finite- width networks.

Bayesian Convolutional Neural Networks with Many Channels are Gaussian Processes

This work derives an analogous equivalence for multi-layer convolutional neural networks both with and without pooling layers, and introduces a Monte Carlo method to estimate the GP corresponding to a given neural network architecture, even in cases where the analytic form has too many terms to be computationally feasible.

Scaling Limits of Wide Neural Networks with Weight Sharing: Gaussian Process Behavior, Gradient Independence, and Neural Tangent Kernel Derivation

This work opens a way toward design of even stronger Gaussian Processes, initialization schemes to avoid gradient explosion/vanishing, and deeper understanding of SGD dynamics in modern architectures.

Asymptotics of Wide Networks from Feynman Diagrams

The method is an adaptation of Feynman diagrams, a standard tool for computing multivariate Gaussian integrals, and applies to study training dynamics, improving existing bounds and deriving new results on wide network evolution during stochastic gradient descent.

Computing with Infinite Networks

For neural networks with a wide class of weight-priors, it can be shown that in the limit of an infinite number of hidden units the prior over functions tends to a Gaussian process. In this paper

Predicting the outputs of finite networks trained with noisy gradients

A DNN training protocol involving noise whose outcome is mappable to a certain non-Gaussian stochastic process and is able to predict the outputs of empirical finite networks with high accuracy, improving upon the accuracy of GP predictions by over an order of magnitude.

Bayesian learning for neural networks

Feature Learning in Infinite-Width Neural Networks

It is shown that the standard and NTK parametrizations of a neural network do not admit infinite-width limits that can learn features, which is crucial for pretraining and transfer learning such as with BERT, and any such infinite- width limit can be computed using the Tensor Programs technique.