Corpus ID: 195069235

Disentangling feature and lazy learning in deep neural networks: an empirical study

@article{Geiger2019DisentanglingFA,
  title={Disentangling feature and lazy learning in deep neural networks: an empirical study},
  author={M. Geiger and S. Spigler and Arthur Jacot and M. Wyart},
  journal={ArXiv},
  year={2019},
  volume={abs/1906.08034}
}
Two distinct limits for deep learning as the net width $h\to\infty$ have been proposed, depending on how the weights of the last layer scale with $h$. In the "lazy-learning" regime, the dynamics becomes linear in the weights and is described by a Neural Tangent Kernel $\Theta$. By contrast, in the "feature-learning" regime, the dynamics can be expressed in terms of the density distribution of the weights. Understanding which regime describes accurately practical architectures and which one… Expand
Kernel and Rich Regimes in Overparametrized Models
Double Trouble in Double Descent : Bias and Variance(s) in the Lazy Regime
When Do Neural Networks Outperform Kernel Methods?
Mean-field inference methods for neural networks
Extreme Memorization via Scale of Initialization
Asymptotics of Wide Networks from Feynman Diagrams
A Dynamical Central Limit Theorem for Shallow Neural Networks
Finite Versus Infinite Neural Networks: an Empirical Study
Non-Gaussian processes and neural networks at finite widths
  • Sho Yaida
  • Computer Science, Mathematics
  • MSML
  • 2020

References

SHOWING 1-10 OF 40 REFERENCES
A Convergence Theory for Deep Learning via Over-Parameterization
Scaling description of generalization with number of parameters in deep learning
Gradient Descent Provably Optimizes Over-parameterized Neural Networks
Comparing Dynamics: Deep Neural Networks versus Glassy Systems
The jamming transition as a paradigm to understand the loss landscape of deep neural networks
...
1
2
3
4
...