• Corpus ID: 235731899

Scale Mixtures of Neural Network Gaussian Processes

  title={Scale Mixtures of Neural Network Gaussian Processes},
  author={Hyung-Chung Lee and Eunggu Yun and Hongseok Yang and Juho Lee},
Recent works have revealed that infinitely-wide feed-forward or recurrent neural networks of any architecture correspond to Gaussian processes referred to as Neural Network Gaussian Processes (NNGPs). While these works have extended the class of neural networks converging to Gaussian processes significantly, however, there has been little focus on broadening the class of stochastic processes that such neural networks converge to. In this work, inspired by the scale mixture of Gaussian random… 
1 Citations

Figures and Tables from this paper

Depth induces scale-averaging in overparameterized linear Bayesian neural networks
Finite deep linear Bayesian neural networks are interpreted as datadependent scale mixtures of Gaussian process predictors across output channels, allowing us to connect limiting results obtained in previous studies within a unified framework.


Deep Neural Networks as Gaussian Processes
The exact equivalence between infinitely wide deep networks and GPs is derived and it is found that test performance increases as finite-width trained networks are made wider and more similar to a GP, and thus that GP predictions typically outperform those of finite- width networks.
Gaussian Process Behaviour in Wide Deep Neural Networks
It is shown that, under broad conditions, as the authors make the architecture increasingly wide, the implied random function converges in distribution to a Gaussian process, formalising and extending existing results by Neal (1996) to deep networks.
Stable behaviour of infinitely wide deep neural networks
The infinite wide limit of the NN is a stochastic process whose finite-dimensional distributions are multivariate stable distributions, which generalizes the class of Gaussian processes recently obtained as infinite wide limits of NNs.
Infinite-channel deep stable convolutional neural networks
This paper assumes iid parameters distributed according to a stable distribution and shows that the infinite-channel limit of a deep feed-forward convolutional NNs, under suitable scaling, is a stochastic process with multivariate stable finite-dimensional distributions.
Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent
This work shows that for wide neural networks the learning dynamics simplify considerably and that, in the infinite width limit, they are governed by a linear model obtained from the first-order Taylor expansion of the network around its initial parameters.
Bayesian Deep Convolutional Networks with Many Channels are Gaussian Processes
This work derives an analogous equivalence for multi-layer convolutional neural networks (CNNs) both with and without pooling layers, and introduces a Monte Carlo method to estimate the GP corresponding to a given neural network architecture, even in cases where the analytic form has too many terms to be computationally feasible.
Tensor Programs I: Wide Feedforward or Recurrent Neural Networks of Any Architecture are Gaussian Processes
  • Greg Yang
  • Computer Science, Physics
  • 2019
This work introduces a language for expressing neural network computations, and it is shown that this Neural Network-Gaussian Process correspondence surprisingly extends to all modern feedforward or recurrent neural networks composed of multilayer perceptron, RNNs, and/or layer normalization.
Infinite attention: NNGP and NTK for deep attention networks
A rigorous extension of results to NNs involving attention layers is provided, showing that unlike single- head attention, which induces non-Gaussian behaviour, multi-head attention architectures behave as GPs as the number of heads tends to infinity.
Deep Convolutional Networks as shallow Gaussian Processes
We show that the output of a (residual) convolutional neural network (CNN) with an appropriate prior over the weights and biases is a Gaussian process (GP) in the limit of infinitely many
Priors for Infinite Networks
In this chapter, I show that priors over network parameters can be defined in such a way that the corresponding priors over functions computed by the network reach reasonable limits as the number of