# Deep Convolutional Networks as shallow Gaussian Processes

@article{GarrigaAlonso2018DeepCN, title={Deep Convolutional Networks as shallow Gaussian Processes}, author={Adri{\`a} Garriga-Alonso and Laurence Aitchison and Carl Edward Rasmussen}, journal={ArXiv}, year={2018}, volume={abs/1808.05587} }

We show that the output of a (residual) convolutional neural network (CNN) with an appropriate prior over the weights and biases is a Gaussian process (GP) in the limit of infinitely many convolutional filters, extending similar results for dense networks. For a CNN, the equivalent kernel can be computed exactly and, unlike "deep kernels", has very few parameters: only the hyperparameters of the original CNN. Further, we show that this kernel has two properties that allow it to be computed…

## 197 Citations

### Bayesian Deep Convolutional Networks with Many Channels are Gaussian Processes

- Computer ScienceICLR
- 2019

This work derives an analogous equivalence for multi-layer convolutional neural networks (CNNs) both with and without pooling layers, and introduces a Monte Carlo method to estimate the GP corresponding to a given neural network architecture, even in cases where the analytic form has too many terms to be computationally feasible.

### Bayesian Convolutional Neural Networks with Many Channels are Gaussian Processes

- Computer ScienceArXiv
- 2018

This work derives an analogous equivalence for multi-layer convolutional neural networks both with and without pooling layers, and introduces a Monte Carlo method to estimate the GP corresponding to a given neural network architecture, even in cases where the analytic form has too many terms to be computationally feasible.

### ENHANCED CONVOLUTIONAL NEURAL KERNELS

- Computer Science
- 2019

The resulting kernel, CNN-GP with LAP and horizontal flip data augmentation, achieves 89% accuracy, matching the performance of AlexNet, and outperforms the best previous classifier that is not a trained neural network (Mairal, 2016).

### Approximation and Learning with Deep Convolutional Models: a Kernel Perspective

- Computer ScienceICLR
- 2022

This paper shows that the RKHS consists of additive models of interaction terms between patches, and that its norm encourages spatial similarities between these terms through pooling layers, and provides generalization bounds which illustrate how pooling and patches yield improved sample complexity guarantees when the target function presents such regularities.

### On Approximation in Deep Convolutional Networks: a Kernel Perspective

- Computer ScienceArXiv
- 2021

It is found that while expressive kernels operating on input patches are important at the first layer, simpler polynomial kernels can suffice in higher layers for good performance, and a precise functional description of the RKHS and its regularization properties is provided.

### Enhanced Convolutional Neural Tangent Kernels

- Computer ScienceArXiv
- 2019

The resulting kernel, CNN-GP with LAP and horizontal flip data augmentation, achieves 89% accuracy, matching the performance of AlexNet, which is the best such result the authors know of for a classifier that is not a trained neural network.

### Approximate Inference Turns Deep Networks into Gaussian Processes

- Computer ScienceNeurIPS
- 2019

This paper shows that certain Gaussian posterior approximations for Bayesian DNNs are equivalent to GP posteriors, and can obtain a GP kernel and a nonlinear feature map while training a DNN, and shows the resulting kernel is the neural tangent kernel.

### Correlated Weights in Infinite Limits of Deep Convolutional Neural Networks

- Computer ScienceUAI
- 2021

Empirical evaluation of the infinitely wide convolutional neural networks shows that optimal performance is achieved between the extremes, indicating that correlations can be useful.

### A Bayesian Perspective on the Deep Image Prior

- Computer Science2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- 2019

It is shown that the deep image prior is asymptotically equivalent to a stationary Gaussian process prior in the limit as the number of channels in each layer of the network goes to infinity, and derive the corresponding kernel, which informs a Bayesian approach to inference.

### Incorporating Prior Knowledge into Neural Networks through an Implicit Composite Kernel

- Computer ScienceArXiv
- 2022

This work proposes to blend the strengths of deep learning and the clear modeling capabilities of GPs by using a composite kernel that combines a kernel implicitly deﬁned by a neural network with a second kernel function chosen to model known properties (e.g., seasonality).

## References

SHOWING 1-10 OF 37 REFERENCES

### Convolutional Gaussian Processes

- Computer ScienceNIPS
- 2017

It is shown how the marginal likelihood can be used to find an optimal weighting between convolutional and RBF kernels to further improve performance, and it is hoped that this illustration of the usefulness of a marginal likelihood will help automate discovering architectures in larger models.

### Deep Neural Networks as Gaussian Processes

- Computer ScienceICLR
- 2018

The exact equivalence between infinitely wide deep networks and GPs is derived and it is found that test performance increases as finite-width trained networks are made wider and more similar to a GP, and thus that GP predictions typically outperform those of finite- width networks.

### Dynamical Isometry and a Mean Field Theory of CNNs: How to Train 10, 000-Layer Vanilla Convolutional Neural Networks

- Computer ScienceICML
- 2018

This work demonstrates that it is possible to train vanilla CNNs with ten thousand layers or more simply by using an appropriate initialization scheme, and presents an algorithm for generating such random initial orthogonal convolution kernels.

### ImageNet classification with deep convolutional neural networks

- Computer ScienceCommun. ACM
- 2012

A large, deep convolutional neural network was trained to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes and employed a recently developed regularization method called "dropout" that proved to be very effective.

### A Gaussian Process perspective on Convolutional Neural Networks

- Computer ScienceArXiv
- 2018

In this paper we cast the well-known convolutional neural network in a Gaussian process perspective. In this way we hope to gain additional insights into the performance of convolutional networks, in…

### Deep Kernel Learning

- Computer ScienceAISTATS
- 2016

We introduce scalable deep kernels, which combine the structural properties of deep learning architectures with the non-parametric flexibility of kernel methods. Specifically, we transform the inputs…

### Wide Residual Networks

- Computer ScienceBMVC
- 2016

This paper conducts a detailed experimental study on the architecture of ResNet blocks and proposes a novel architecture where the depth and width of residual networks are decreased and the resulting network structures are called wide residual networks (WRNs), which are far superior over their commonly used thin and very deep counterparts.

### Deep Residual Learning for Image Recognition

- Computer Science2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- 2016

This work presents a residual learning framework to ease the training of networks that are substantially deeper than those used previously, and provides comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.

### Gaussian Process Behaviour in Wide Deep Neural Networks

- Computer ScienceICLR
- 2018

It is shown that, under broad conditions, as the authors make the architecture increasingly wide, the implied random function converges in distribution to a Gaussian process, formalising and extending existing results by Neal (1996) to deep networks.

### Identity Mappings in Deep Residual Networks

- Computer ScienceECCV
- 2016

The propagation formulations behind the residual building blocks suggest that the forward and backward signals can be directly propagated from one block to any other block, when using identity mappings as the skip connections and after-addition activation.