Approximation and Learning with Deep Convolutional Models: a Kernel Perspective
@inproceedings{Bietti2021ApproximationAL, title={Approximation and Learning with Deep Convolutional Models: a Kernel Perspective}, author={Alberto Bietti}, booktitle={International Conference on Learning Representations}, year={2021} }
The empirical success of deep convolutional networks on tasks involving highdimensional data such as images or audio suggests that they can efficiently approximate certain functions that are well-suited for such tasks. In this paper, we study this through the lens of kernel methods, by considering simple hierarchical kernels with two or three convolution and pooling layers, inspired by convolutional kernel networks. These achieve good empirical performance on standard vision datasets, while…
16 Citations
Neural Contextual Bandits without Regret
- Computer ScienceAISTATS
- 2022
This work analyzes NTK-UCB, a kernelized bandit optimization algorithm employing the Neural Tangent Kernel, and bound its regret in terms of the NTK maximum information gain γ T, a complexity parameter capturing the difficulty of learning.
How Wide Convolutional Neural Networks Learn Hierarchical Tasks
- Computer ScienceArXiv
- 2022
It is shown that the spectrum of the corresponding kernel and its asymptotics inherit the hierarchical structure of the network, which implies that despite their hierarchical structure, the functions generated by deep CNNs are too rich to be efficiently learnable in high dimension.
What can be learnt with wide convolutional neural networks?
- Computer Science
- 2022
Interestingly, it is found that, despite their hierarchical structure, the functions generated by deep CNNs are too rich to be efficiently learnable in high dimension.
Eigenspace Restructuring: a Principle of Space and Frequency in Neural Networks
- Computer ScienceCOLT
- 2022
It is shown that the topologies from deep convolutional networks (CNNs) restructure the associated eigenspaces into finer subspaces, and a sharp characterization of the generalization error for infinite-width CNNs of any depth in the high-dimensional setting is proved.
The SSL Interplay: Augmentations, Inductive Bias, and Generalization
- Psychology, Computer ScienceArXiv
- 2023
This work studies the complex interplay between the choice of data augmentation, network architecture, and training algorithm in self-supervised learning with a precise analysis of generalization performance on both pretraining and downstream tasks in a theory friendly setup.
Strong inductive biases provably prevent harmless interpolation
- Computer ScienceArXiv
- 2023
This paper argues that the degree to which interpolation is harmless hinges upon the strength of an estimator's inductive bias, i.e., how heavily the estimator favors solutions with a certain structure, and establishes tight non-asymptotic bounds for high-dimensional kernel regression that reflect this phenomenon for convolutional kernels.
On the Universal Approximation Property of Deep Fully Convolutional Neural Networks
- Computer ScienceArXiv
- 2022
It is proved that deep residual fully convolutional networks and their continuous-layer coun-terpart can achieve universal approximation of shift-invariant or equivariant functions at constant channel width.
Transfer Learning with Kernel Methods
- Computer ScienceArXiv
- 2022
It is shown that transferring modern kernels trained on large-scale image datasets can result in substantial performance increase as compared to using the same kernel trained directly on the target task, and that transfer-learned kernels allow a more accurate prediction of the effect of drugs on cancer cell lines.
Synergy and Symmetry in Deep Learning: Interactions between the Data, Model, and Inference Algorithm
- Computer ScienceICML
- 2022
This paper analyzes the triplet ( D, M, I ) as an integrated system and identifies important synergies that help mitigate the curse of dimensionality.
A view of mini-batch SGD via generating functions: conditions of convergence, phase transitions, benefit from negative momenta
- Computer ScienceArXiv
- 2022
A new analytic framework to analyze noise-averaged properties of mini-batch SGD for linear models at constant learning rates, momenta and sizes of batches is developed and finds that the SGD dynamics exhibits several convergent and divergent regimes depending on the spectral distributions of the problem.
References
SHOWING 1-10 OF 73 REFERENCES
Neural Kernels Without Tangents
- Computer ScienceICML
- 2020
Using well established feature space tools such as direct sum, averaging, and moment lifting, an algebra for creating "compositional" kernels from bags of features is presented that corresponds to many of the building blocks of "neural tangent kernels (NTK).
Tensor products of Sobolev-Besov spaces and applications to approximation from the hyperbolic cross
- MathematicsJ. Approx. Theory
- 2009
Breaking the Curse of Dimensionality with Convex Neural Networks
- Computer ScienceJ. Mach. Learn. Res.
- 2017
This work considers neural networks with a single hidden layer and non-decreasing homogeneous activa-tion functions like the rectified linear units and shows that they are adaptive to unknown underlying linear structures, such as the dependence on the projection of the input variables onto a low-dimensional subspace.
On Exact Computation with an Infinitely Wide Neural Net
- Computer ScienceNeurIPS
- 2019
The current paper gives the first efficient exact algorithm for computing the extension of NTK to convolutional neural nets, which it is called Convolutional NTK (CNTK), as well as an efficient GPU implementation of this algorithm.
Why and when can deep-but not shallow-networks avoid the curse of dimensionality: A review
- Computer ScienceInt. J. Autom. Comput.
- 2017
An emerging body of theoretical results on deep learning including the conditions under which it can be exponentially better than shallow learning are reviewed, together with new results, open problems and conjectures.
Regularization with Dot-Product Kernels
- MathematicsNIPS
- 2000
This paper gives an explicit functional form for the feature map by calculating its eigenfunctions and eigenvalues and shows that if the kernel is analytic (i.e. can be expanded in a Taylor series), all expansion coefficients have to be nonnegative.
Learning Theory from First Principles (draft)
- URL https://www.di.ens. fr/~fbach/ltfp_book.pdf
- 2021
High-dimensional statistics: A non-asymptotic viewpoint, volume 48
- 2019
Learning with invariances in random features and kernel models
- Computer Science, MathematicsCOLT
- 2021
This work characterize the test error of invariant methods in a high-dimensional regime in which the sample size and number of hidden units scale as polynomials in the dimension, and shows that exploiting invariance in the architecture saves a d factor to achieve the same test error as for unstructured architectures.