• Corpus ID: 52019231

Deep Convolutional Networks as shallow Gaussian Processes

@article{GarrigaAlonso2018DeepCN,
  title={Deep Convolutional Networks as shallow Gaussian Processes},
  author={Adri{\`a} Garriga-Alonso and Laurence Aitchison and Carl Edward Rasmussen},
  journal={ArXiv},
  year={2018},
  volume={abs/1808.05587}
}
We show that the output of a (residual) convolutional neural network (CNN) with an appropriate prior over the weights and biases is a Gaussian process (GP) in the limit of infinitely many convolutional filters, extending similar results for dense networks. For a CNN, the equivalent kernel can be computed exactly and, unlike "deep kernels", has very few parameters: only the hyperparameters of the original CNN. Further, we show that this kernel has two properties that allow it to be computed… 

Figures and Tables from this paper

Bayesian Deep Convolutional Networks with Many Channels are Gaussian Processes

This work derives an analogous equivalence for multi-layer convolutional neural networks (CNNs) both with and without pooling layers, and introduces a Monte Carlo method to estimate the GP corresponding to a given neural network architecture, even in cases where the analytic form has too many terms to be computationally feasible.

Bayesian Convolutional Neural Networks with Many Channels are Gaussian Processes

This work derives an analogous equivalence for multi-layer convolutional neural networks both with and without pooling layers, and introduces a Monte Carlo method to estimate the GP corresponding to a given neural network architecture, even in cases where the analytic form has too many terms to be computationally feasible.

ENHANCED CONVOLUTIONAL NEURAL KERNELS

  • Computer Science
  • 2019
The resulting kernel, CNN-GP with LAP and horizontal flip data augmentation, achieves 89% accuracy, matching the performance of AlexNet, and outperforms the best previous classifier that is not a trained neural network (Mairal, 2016).

Approximation and Learning with Deep Convolutional Models: a Kernel Perspective

This paper shows that the RKHS consists of additive models of interaction terms between patches, and that its norm encourages spatial similarities between these terms through pooling layers, and provides generalization bounds which illustrate how pooling and patches yield improved sample complexity guarantees when the target function presents such regularities.

On Approximation in Deep Convolutional Networks: a Kernel Perspective

It is found that while expressive kernels operating on input patches are important at the first layer, simpler polynomial kernels can suffice in higher layers for good performance, and a precise functional description of the RKHS and its regularization properties is provided.

Enhanced Convolutional Neural Tangent Kernels

The resulting kernel, CNN-GP with LAP and horizontal flip data augmentation, achieves 89% accuracy, matching the performance of AlexNet, which is the best such result the authors know of for a classifier that is not a trained neural network.

Translation Insensitivity for Deep Convolutional Gaussian Processes

A translation insensitive convolutional kernel is introduced, which removes the restriction of requiring identical outputs for identical patch inputs, and it is shown empirically that this convolutionAL kernel improves performances in both shallow and deep models.

Approximate Inference Turns Deep Networks into Gaussian Processes

This paper shows that certain Gaussian posterior approximations for Bayesian DNNs are equivalent to GP posteriors, and can obtain a GP kernel and a nonlinear feature map while training a DNN, and shows the resulting kernel is the neural tangent kernel.

Correlated Weights in Infinite Limits of Deep Convolutional Neural Networks

Empirical evaluation of the infinitely wide convolutional neural networks shows that optimal performance is achieved between the extremes, indicating that correlations can be useful.

A Bayesian Perspective on the Deep Image Prior

It is shown that the deep image prior is asymptotically equivalent to a stationary Gaussian process prior in the limit as the number of channels in each layer of the network goes to infinity, and derive the corresponding kernel, which informs a Bayesian approach to inference.
...

References

SHOWING 1-10 OF 37 REFERENCES

Deep Gaussian Processes with Convolutional Kernels

Convolutional DGP (CDGP) models are developed which effectively capture image level features through the use of convolution kernels, therefore opening up the way for applying DGPs to computer vision tasks.

Deep Neural Networks as Gaussian Processes

The exact equivalence between infinitely wide deep networks and GPs is derived and it is found that test performance increases as finite-width trained networks are made wider and more similar to a GP, and thus that GP predictions typically outperform those of finite- width networks.

Dynamical Isometry and a Mean Field Theory of CNNs: How to Train 10, 000-Layer Vanilla Convolutional Neural Networks

This work demonstrates that it is possible to train vanilla CNNs with ten thousand layers or more simply by using an appropriate initialization scheme, and presents an algorithm for generating such random initial orthogonal convolution kernels.

ImageNet classification with deep convolutional neural networks

A large, deep convolutional neural network was trained to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes and employed a recently developed regularization method called "dropout" that proved to be very effective.

A Gaussian Process perspective on Convolutional Neural Networks

In this paper we cast the well-known convolutional neural network in a Gaussian process perspective. In this way we hope to gain additional insights into the performance of convolutional networks, in

Deep Kernel Learning

We introduce scalable deep kernels, which combine the structural properties of deep learning architectures with the non-parametric flexibility of kernel methods. Specifically, we transform the inputs

Wide Residual Networks

This paper conducts a detailed experimental study on the architecture of ResNet blocks and proposes a novel architecture where the depth and width of residual networks are decreased and the resulting network structures are called wide residual networks (WRNs), which are far superior over their commonly used thin and very deep counterparts.

Deep Residual Learning for Image Recognition

This work presents a residual learning framework to ease the training of networks that are substantially deeper than those used previously, and provides comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.

Gaussian Process Behaviour in Wide Deep Neural Networks

It is shown that, under broad conditions, as the authors make the architecture increasingly wide, the implied random function converges in distribution to a Gaussian process, formalising and extending existing results by Neal (1996) to deep networks.

Identity Mappings in Deep Residual Networks

The propagation formulations behind the residual building blocks suggest that the forward and backward signals can be directly propagated from one block to any other block, when using identity mappings as the skip connections and after-addition activation.