• Corpus ID: 235368380

Approximation and Learning with Deep Convolutional Models: a Kernel Perspective

  title={Approximation and Learning with Deep Convolutional Models: a Kernel Perspective},
  author={Alberto Bietti},
  booktitle={International Conference on Learning Representations},
  • A. Bietti
  • Published in
    International Conference on…
    19 February 2021
  • Computer Science
The empirical success of deep convolutional networks on tasks involving highdimensional data such as images or audio suggests that they can efficiently approximate certain functions that are well-suited for such tasks. In this paper, we study this through the lens of kernel methods, by considering simple hierarchical kernels with two or three convolution and pooling layers, inspired by convolutional kernel networks. These achieve good empirical performance on standard vision datasets, while… 

Figures and Tables from this paper

Neural Contextual Bandits without Regret

This work analyzes NTK-UCB, a kernelized bandit optimization algorithm employing the Neural Tangent Kernel, and bound its regret in terms of the NTK maximum information gain γ T, a complexity parameter capturing the difficulty of learning.

How Wide Convolutional Neural Networks Learn Hierarchical Tasks

It is shown that the spectrum of the corresponding kernel and its asymptotics inherit the hierarchical structure of the network, which implies that despite their hierarchical structure, the functions generated by deep CNNs are too rich to be efficiently learnable in high dimension.

What can be learnt with wide convolutional neural networks?

Interestingly, it is found that, despite their hierarchical structure, the functions generated by deep CNNs are too rich to be efficiently learnable in high dimension.

Eigenspace Restructuring: a Principle of Space and Frequency in Neural Networks

It is shown that the topologies from deep convolutional networks (CNNs) restructure the associated eigenspaces into finer subspaces, and a sharp characterization of the generalization error for infinite-width CNNs of any depth in the high-dimensional setting is proved.

The SSL Interplay: Augmentations, Inductive Bias, and Generalization

This work studies the complex interplay between the choice of data augmentation, network architecture, and training algorithm in self-supervised learning with a precise analysis of generalization performance on both pretraining and downstream tasks in a theory friendly setup.

Strong inductive biases provably prevent harmless interpolation

This paper argues that the degree to which interpolation is harmless hinges upon the strength of an estimator's inductive bias, i.e., how heavily the estimator favors solutions with a certain structure, and establishes tight non-asymptotic bounds for high-dimensional kernel regression that reflect this phenomenon for convolutional kernels.

On the Universal Approximation Property of Deep Fully Convolutional Neural Networks

It is proved that deep residual fully convolutional networks and their continuous-layer coun-terpart can achieve universal approximation of shift-invariant or equivariant functions at constant channel width.

Transfer Learning with Kernel Methods

It is shown that transferring modern kernels trained on large-scale image datasets can result in substantial performance increase as compared to using the same kernel trained directly on the target task, and that transfer-learned kernels allow a more accurate prediction of the effect of drugs on cancer cell lines.

Synergy and Symmetry in Deep Learning: Interactions between the Data, Model, and Inference Algorithm

This paper analyzes the triplet ( D, M, I ) as an integrated system and identifies important synergies that help mitigate the curse of dimensionality.

A view of mini-batch SGD via generating functions: conditions of convergence, phase transitions, benefit from negative momenta

A new analytic framework to analyze noise-averaged properties of mini-batch SGD for linear models at constant learning rates, momenta and sizes of batches is developed and finds that the SGD dynamics exhibits several convergent and divergent regimes depending on the spectral distributions of the problem.



Neural Kernels Without Tangents

Using well established feature space tools such as direct sum, averaging, and moment lifting, an algebra for creating "compositional" kernels from bags of features is presented that corresponds to many of the building blocks of "neural tangent kernels (NTK).

Breaking the Curse of Dimensionality with Convex Neural Networks

  • F. Bach
  • Computer Science
    J. Mach. Learn. Res.
  • 2017
This work considers neural networks with a single hidden layer and non-decreasing homogeneous activa-tion functions like the rectified linear units and shows that they are adaptive to unknown underlying linear structures, such as the dependence on the projection of the input variables onto a low-dimensional subspace.

On Exact Computation with an Infinitely Wide Neural Net

The current paper gives the first efficient exact algorithm for computing the extension of NTK to convolutional neural nets, which it is called Convolutional NTK (CNTK), as well as an efficient GPU implementation of this algorithm.

Why and when can deep-but not shallow-networks avoid the curse of dimensionality: A review

An emerging body of theoretical results on deep learning including the conditions under which it can be exponentially better than shallow learning are reviewed, together with new results, open problems and conjectures.

Regularization with Dot-Product Kernels

This paper gives an explicit functional form for the feature map by calculating its eigenfunctions and eigenvalues and shows that if the kernel is analytic (i.e. can be expanded in a Taylor series), all expansion coefficients have to be nonnegative.

Learning Theory from First Principles (draft)

  • URL https://www.di.ens. fr/~fbach/ltfp_book.pdf
  • 2021

High-dimensional statistics: A non-asymptotic viewpoint, volume 48

  • 2019

Learning with invariances in random features and kernel models

This work characterize the test error of invariant methods in a high-dimensional regime in which the sample size and number of hidden units scale as polynomials in the dimension, and shows that exploiting invariance in the architecture saves a d factor to achieve the same test error as for unstructured architectures.