Corpus ID: 220042268

When Do Neural Networks Outperform Kernel Methods?

  title={When Do Neural Networks Outperform Kernel Methods?},
  author={B. Ghorbani and Song Mei and Theodor Misiakiewicz and A. Montanari},
  • B. Ghorbani, Song Mei, +1 author A. Montanari
  • Published 2020
  • Computer Science, Mathematics
  • ArXiv
  • For a certain scaling of the initialization of stochastic gradient descent (SGD), wide neural networks (NN) have been shown to be well approximated by reproducing kernel Hilbert space (RKHS) methods. Recent empirical work showed that, for some classification tasks, RKHS methods can replace NNs without a large loss in performance. On the other hand, two-layers NNs are known to encode richer smoothness classes than RKHS and we know of special examples for which SGD-trained NN provably outperform… CONTINUE READING
    14 Citations
    Mathematical Models of Overparameterized Neural Networks
    • PDF
    Deep Networks and the Multiple Manifold Problem
    • PDF
    A Dynamical Central Limit Theorem for Shallow Neural Networks
    • 2
    • PDF
    Hold me tight! Influence of discriminative features on deep network boundaries
    • 6
    • PDF
    Perspective: A Phase Diagram for Deep Learning unifying Jamming, Feature Learning and Lazy Training
    • PDF


    Implicit Bias of Gradient Descent for Wide Two-layer Neural Networks Trained with the Logistic Loss
    • 54
    • PDF
    What Can ResNet Learn Efficiently, Going Beyond Kernels?
    • 48
    • PDF
    Toward Moderate Overparameterization: Global Convergence Guarantees for Training Shallow Neural Networks
    • 110
    • PDF
    Harnessing the Power of Infinitely Wide Deep Nets on Small-data Tasks
    • 41
    • PDF
    On Lazy Training in Differentiable Programming
    • 178
    • PDF
    Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent
    • 269
    • PDF
    Breaking the Curse of Dimensionality with Convex Neural Networks
    • 292
    • PDF
    A Convergence Theory for Deep Learning via Over-Parameterization
    • 460
    • PDF
    Neural Kernels Without Tangents
    • 21
    • PDF
    On the Power and Limitations of Random Features for Understanding Neural Networks
    • 62
    • PDF