# Scale Mixtures of Neural Network Gaussian Processes

@article{Lee2021ScaleMO, title={Scale Mixtures of Neural Network Gaussian Processes}, author={Hyung-Chung Lee and Eunggu Yun and Hongseok Yang and Juho Lee}, journal={ArXiv}, year={2021}, volume={abs/2107.01408} }

Recent works have revealed that infinitely-wide feed-forward or recurrent neural networks of any architecture correspond to Gaussian processes referred to as Neural Network Gaussian Processes (NNGPs). While these works have extended the class of neural networks converging to Gaussian processes significantly, however, there has been little focus on broadening the class of stochastic processes that such neural networks converge to. In this work, inspired by the scale mixture of Gaussian random…

## One Citation

Depth induces scale-averaging in overparameterized linear Bayesian neural networks

- Computer Science, MathematicsArXiv
- 2021

Finite deep linear Bayesian neural networks are interpreted as datadependent scale mixtures of Gaussian process predictors across output channels, allowing us to connect limiting results obtained in previous studies within a unified framework.

## References

SHOWING 1-10 OF 17 REFERENCES

Deep Neural Networks as Gaussian Processes

- Computer Science, MathematicsICLR
- 2018

The exact equivalence between infinitely wide deep networks and GPs is derived and it is found that test performance increases as finite-width trained networks are made wider and more similar to a GP, and thus that GP predictions typically outperform those of finite- width networks.

Gaussian Process Behaviour in Wide Deep Neural Networks

- Computer Science, MathematicsICLR
- 2018

It is shown that, under broad conditions, as the authors make the architecture increasingly wide, the implied random function converges in distribution to a Gaussian process, formalising and extending existing results by Neal (1996) to deep networks.

Stable behaviour of infinitely wide deep neural networks

- Mathematics, Computer ScienceAISTATS
- 2020

The infinite wide limit of the NN is a stochastic process whose finite-dimensional distributions are multivariate stable distributions, which generalizes the class of Gaussian processes recently obtained as infinite wide limits of NNs.

Infinite-channel deep stable convolutional neural networks

- Computer Science, MathematicsArXiv
- 2021

This paper assumes iid parameters distributed according to a stable distribution and shows that the infinite-channel limit of a deep feed-forward convolutional NNs, under suitable scaling, is a stochastic process with multivariate stable finite-dimensional distributions.

Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent

- Computer Science, MathematicsNeurIPS
- 2019

This work shows that for wide neural networks the learning dynamics simplify considerably and that, in the infinite width limit, they are governed by a linear model obtained from the first-order Taylor expansion of the network around its initial parameters.

Bayesian Deep Convolutional Networks with Many Channels are Gaussian Processes

- Computer Science, MathematicsICLR
- 2019

This work derives an analogous equivalence for multi-layer convolutional neural networks (CNNs) both with and without pooling layers, and introduces a Monte Carlo method to estimate the GP corresponding to a given neural network architecture, even in cases where the analytic form has too many terms to be computationally feasible.

Tensor Programs I: Wide Feedforward or Recurrent Neural Networks of Any Architecture are Gaussian Processes

- Computer Science, PhysicsNeurIPS
- 2019

This work introduces a language for expressing neural network computations, and it is shown that this Neural Network-Gaussian Process correspondence surprisingly extends to all modern feedforward or recurrent neural networks composed of multilayer perceptron, RNNs, and/or layer normalization.

Infinite attention: NNGP and NTK for deep attention networks

- Computer Science, MathematicsICML
- 2020

A rigorous extension of results to NNs involving attention layers is provided, showing that unlike single- head attention, which induces non-Gaussian behaviour, multi-head attention architectures behave as GPs as the number of heads tends to infinity.

Deep Convolutional Networks as shallow Gaussian Processes

- Computer Science, MathematicsICLR
- 2019

We show that the output of a (residual) convolutional neural network (CNN) with an appropriate prior over the weights and biases is a Gaussian process (GP) in the limit of infinitely many…

Priors for Infinite Networks

- Mathematics
- 1996

In this chapter, I show that priors over network parameters can be defined in such a way that the corresponding priors over functions computed by the network reach reasonable limits as the number of…