Corpus ID: 235446555

Locality defeats the curse of dimensionality in convolutional teacher-student scenarios

  title={Locality defeats the curse of dimensionality in convolutional teacher-student scenarios},
  author={Alessandro Favero and F. Cagnetta and M. Wyart},
Convolutional neural networks perform a local and translationally-invariant treatment of the data: quantifying which of these two aspects is central to their success remains a challenge. We study this problem within a teacher-student framework for kernel regression, using ‘convolutional’ kernels inspired by the neural tangent kernel of simple convolutional architectures of given filter size. Using heuristic methods from physics, we find in the ridgeless case that locality is key in determining… Expand
1 Citations

Figures from this paper

On the Sample Complexity of Learning with Geometric Stability
This work provides non-parametric rates of convergence for kernel methods, and shows improvements in sample complexity by a factor equal to the size of the group when using an invariant kernel over the group, compared to the corresponding non-invariant kernel. Expand


On Approximation in Deep Convolutional Networks: a Kernel Perspective
It is found that while expressive kernels operating on input patches are important at the first layer, simpler polynomial kernels can suffice in higher layers for good performance, and a precise functional description of the RKHS and its regularization properties is provided. Expand
Theoretical issues in deep networks
It is proved that for certain types of compositional functions, deep networks of the convolutional type (even without weight sharing) can avoid the curse of dimensionality. Expand
End-to-End Kernel Learning with Supervised Convolutional Kernel Networks
  • J. Mairal
  • Computer Science, Mathematics
  • NIPS
  • 2016
A new image representation based on a multilayer kernel machine that achieves reasonably competitive performance for image classification on some standard "deep learning" datasets and also for image super-resolution, demonstrating the applicability of the approach to a large variety of image-related tasks. Expand
On Lazy Training in Differentiable Programming
This work shows that this "lazy training" phenomenon is not specific to over-parameterized neural networks, and is due to a choice of scaling that makes the model behave as its linearization around the initialization, thus yielding a model equivalent to learning with positive-definite kernels. Expand
On the Inductive Bias of Neural Tangent Kernels
This work studies smoothness, approximation, and stability properties of functions with finite norm, including stability to image deformations in the case of convolutional networks, and compares to other known kernels for similar architectures. Expand
Spectral bias and task-model alignment explain generalization in kernel regression and infinitely wide neural networks
This work investigates generalization error for kernel regression, and proposes a predictive theory of generalization in kernel regression applicable to real data, which explains various generalization phenomena observed in wide neural networks, which admit a kernel limit and generalize well despite being overparameterized. Expand
Deep Learning Scaling is Predictable, Empirically
A large scale empirical characterization of generalization error and model size growth as training sets grow is presented and it is shown that model size scales sublinearly with data size. Expand
On Exact Computation with an Infinitely Wide Neural Net
The current paper gives the first efficient exact algorithm for computing the extension of NTK to convolutional neural nets, which it is called Convolutional NTK (CNTK), as well as an efficient GPU implementation of this algorithm. Expand
Towards Learning Convolutions from Scratch
This work proposes $\beta$-LASSO, a simple variant of LASSO algorithm that, when applied on fully-connected networks for image classification tasks, learns architectures with local connections and achieves state-of-the-art accuracies for training fully- connected nets. Expand
Capturing the learning curves of generic features maps for realistic data sets with a teacher-student model
A rigorous formula is proved for the asymptotic training loss and generalisation error achieved by empirical risk minimization for the high-dimensional Gaussian covariate model used in teacher-student models. Expand