• Corpus ID: 211989288

Neural Kernels Without Tangents

@inproceedings{Shankar2020NeuralKW,
  title={Neural Kernels Without Tangents},
  author={Vaishaal Shankar and Alexander W. Fang and Wenshuo Guo and Sara Fridovich-Keil and Ludwig Schmidt and Jonathan Ragan-Kelley and Benjamin Recht},
  booktitle={ICML},
  year={2020}
}
We investigate the connections between neural networks and simple building blocks in kernel space. In particular, using well established feature space tools such as direct sum, averaging, and moment lifting, we present an algebra for creating "compositional" kernels from bags of features. We show that these operations correspond to many of the building blocks of "neural tangent kernels (NTK)". Experimentally, we show that there is a correlation in test error between neural network architectures… 

Random Features for the Neural Tangent Kernel

This work proposes an efficient feature map construction of the NTK of fullyconnected ReLU network which enables it to apply it to large-scale datasets and shows that dimension of the resulting features is much smaller than other baseline feature map constructions to achieve comparable error bounds both in theory and practice.

Fast Neural Kernel Embeddings for General Activations

A fast sketching method that approximates any multi-layered Neural Network Gaussian Process (NNGP) kernel and Neural Tangent Kernel (NTK) matrices for a wide range of activation functions, going beyond the commonly analyzed ReLU activation.

Learning with convolution and pooling operations in kernel methods

This paper considers the stylized setting of covariates (image pixels) uniformly distributed on the hypercube, and fully characterize the RKHS of kernels composed of single layers of convolution, pooling, and downsampling operations, and quantifies how choosing an architecture adapted to the target function leads to a large improvement in the sample complexity.

When do neural networks outperform kernel methods?

It is shown that this curse of dimensionality becomes milder if the covariates display the same low-dimensional structure as the target function, and a spiked covariates model is presented that can capture in a unified framework both behaviors observed in earlier work.

Classifying high-dimensional Gaussian mixtures: Where kernel methods fail and neural networks succeed

It is theoretically shown that two-layer neural networks (2LNN) with only a few neurons can beat the performance of kernel learning on a simple Gaussian mixture classification task and illustrates how over-parametrising the neural network leads to faster convergence, but does not improve its final performance.

Weighted Neural Tangent Kernel: A Generalized and Improved Network-Induced Kernel

The Weighted Neural Tangent Kernel is introduced, a generalized and improved tool, which can capture an over-parameterized NN’s training dynamics under different optimizers, and the stability of the WNTK at initialization and during training is proved.

On Approximation in Deep Convolutional Networks: a Kernel Perspective

It is found that while expressive kernels operating on input patches are important at the first layer, simpler polynomial kernels can suffice in higher layers for good performance, and a precise functional description of the RKHS and its regularization properties is provided.

Approximation and Learning with Deep Convolutional Models: a Kernel Perspective

This paper shows that the RKHS consists of additive models of interaction terms between patches, and that its norm encourages spatial similarities between these terms through pooling layers, and provides generalization bounds which illustrate how pooling and patches yield improved sample complexity guarantees when the target function presents such regularities.

Limitations of the NTK for Understanding Generalization in Deep Learning

This work studies NTKs through the lens of scaling laws, and proves that they fall short of explaining important aspects of neural network generalization, and establishes concrete limitations of the NTK approach in understanding generalization of real networks on natural datasets.

Finite Versus Infinite Neural Networks: an Empirical Study

Improved best practices for using NNGP and NT kernels for prediction are developed, including a novel ensembling technique that achieves state-of-the-art results on CIFAR-10 classification for kernels corresponding to each architecture class the authors consider.
...

References

SHOWING 1-10 OF 37 REFERENCES

Enhanced Convolutional Neural Tangent Kernels

The resulting kernel, CNN-GP with LAP and horizontal flip data augmentation, achieves 89% accuracy, matching the performance of AlexNet, which is the best such result the authors know of for a classifier that is not a trained neural network.

Convolutional Kernel Networks

This paper proposes a new type of convolutional neural network (CNN) whose invariance is encoded by a reproducing kernel, and bridges a gap between the neural network literature and kernels, which are natural tools to model invariance.

End-to-End Kernel Learning with Supervised Convolutional Kernel Networks

A new image representation based on a multilayer kernel machine that achieves reasonably competitive performance for image classification on some standard "deep learning" datasets and also for image super-resolution, demonstrating the applicability of the approach to a large variety of image-related tasks.

Scalable Kernel Methods via Doubly Stochastic Gradients

An approach that scales up kernel methods using a novel concept called "doubly stochastic functional gradients" based on the fact that many kernel methods can be expressed as convex optimization problems, which can readily scale kernel methods up to the regimes which are dominated by neural nets.

A Kernel Theory of Modern Data Augmentation

This paper provides a general model of augmentation as a Markov process, and shows that kernels appear naturally with respect to this model, even when the authors do not employ kernel classification, and analyzes more directly the effect of Augmentation on kernel classifiers.

Bayesian Deep Convolutional Networks with Many Channels are Gaussian Processes

This work derives an analogous equivalence for multi-layer convolutional neural networks (CNNs) both with and without pooling layers, and introduces a Monte Carlo method to estimate the GP corresponding to a given neural network architecture, even in cases where the analytic form has too many terms to be computationally feasible.

On Exact Computation with an Infinitely Wide Neural Net

The current paper gives the first efficient exact algorithm for computing the extension of NTK to convolutional neural nets, which it is called Convolutional NTK (CNTK), as well as an efficient GPU implementation of this algorithm.

Deep Neural Networks as Gaussian Processes

The exact equivalence between infinitely wide deep networks and GPs is derived and it is found that test performance increases as finite-width trained networks are made wider and more similar to a GP, and thus that GP predictions typically outperform those of finite- width networks.

Wide neural networks of any depth evolve as linear models under gradient descent

This work shows that for wide NNs the learning dynamics simplify considerably and that, in the infinite width limit, they are governed by a linear model obtained from the first-order Taylor expansion of the network around its initial parameters.

Neural tangent kernel: convergence and generalization in neural networks (invited paper)

This talk will introduce this formalism and give a number of results on the Neural Tangent Kernel and explain how they give us insight into the dynamics of neural networks during training and into their generalization features.