• Corpus ID: 236635194

Deep Networks Provably Classify Data on Curves

@inproceedings{Wang2021DeepNP,
  title={Deep Networks Provably Classify Data on Curves},
  author={Tingran Wang and Sam Buchanan and Dar Gilboa and John N. Wright},
  booktitle={NeurIPS},
  year={2021}
}
Data with low-dimensional nonlinear structure are ubiquitous in engineering and scientific problems. We study a model problem with such structure—a binary classification task that uses a deep fully-connected neural network to classify data drawn from two disjoint smooth curves on the unit sphere. Aside from mild regularity conditions, we place no restrictions on the configuration of the curves. We prove that when (i) the network depth is large relative to certain geometric properties that set… 

On the principles of Parsimony and Self-consistency for the emergence of intelligence

TLDR
A theoretical framework is proposed that sheds light on understanding deep networks within a bigger picture of intelligence in general and introduces two fundamental principles, Parsimony and Self-consistency, which address two fundamental questions regarding intelligence: what to learn and how to learn, respectively.

References

SHOWING 1-10 OF 66 REFERENCES

Deep Networks and the Multiple Manifold Problem

TLDR
This work contributes essentially optimal rates of concentration for the neural tangent kernel of deep fully-connected networks, requiring width n to achieve uniform concentration of the initial kernel over a low-dimensional submanifold of the unit sphere, and a nonasymptotic framework for establishing generalization of networks trained in the NTK regime with structured data.

The Intrinsic Dimension of Images and Its Impact on Learning

TLDR
It is found that common natural image datasets indeed have very low intrinsic dimension relative to the high number of pixels in the images, and these datasets are easier for neural networks to learn, and models solving these tasks generalize better from training to test data.

Minimum-Distortion Embedding

TLDR
A projected quasi-Newton method is developed that approximately solves MDE problems and scales to large data sets and provides principled ways of validating historical and new embeddings alike.

Learning with invariances in random features and kernel models

TLDR
This work characterize the test error of invariant methods in a high-dimensional regime in which the sample size and number of hidden units scale as polynomials in the dimension, and shows that exploiting invariance in the architecture saves a d factor to achieve the same test error as for unstructured architectures.

Classifying high-dimensional Gaussian mixtures: Where kernel methods fail and neural networks succeed

TLDR
It is theoretically shown that two-layer neural networks (2LNN) with only a few neurons can beat the performance of kernel learning on a simple Gaussian mixture classification task and illustrates how over-parametrising the neural network leads to faster convergence, but does not improve its final performance.

A Local Convergence Theory for Mildly Over-Parameterized Two-Layer Neural Network

TLDR
A local convergence theory for mildly over-parameterized two-layer neural net is developed and it is shown that as long as the loss is already lower than a threshold (polynomial in relevant parameters), all student neurons in an over- parameters-based neural network will converge to one of teacher neurons, and the loss will go to 0.

Generalization error of random feature and kernel methods: hypercontractivity and kernel matrix concentration

Deep Equals Shallow for ReLU Networks in Kernel Regimes

TLDR
It is shown that for ReLU activations, the kernels derived from deep fully-connected networks have essentially the same approximation properties as their "shallow" two-layer counterpart, namely the same eigenvalue decay for the corresponding integral operator.

Deep Neural Tangent Kernel and Laplace Kernel Have the Same RKHS

We prove that the reproducing kernel Hilbert spaces (RKHS) of a deep neural tangent kernel and the Laplace kernel include the same set of functions, when both kernels are restricted to the sphere

The Interpolation Phase Transition in Neural Networks: Memorization and Generalization under Lazy Training

TLDR
It is shown that the network approximately performs ridge regression in the raw features, with a strictly positive `self-induced' regularization in the context of two-layers neural networks in the neural tangent (NT) regime.
...