Feature Learning in Infinite-Width Neural Networks
@article{Yang2020FeatureLI, title={Feature Learning in Infinite-Width Neural Networks}, author={Greg Yang and Edward J. Hu}, journal={ArXiv}, year={2020}, volume={abs/2011.14522} }
As its width tends to infinity, a deep neural network's behavior under gradient descent can become simplified and predictable (e.g. given by the Neural Tangent Kernel (NTK)), if it is parametrized appropriately (e.g. the NTK parametrization). However, we show that the standard and NTK parametrizations of a neural network do not admit infinite-width limits that can learn features, which is crucial for pretraining and transfer learning such as with BERT. We propose simple modifications to the… CONTINUE READING
Figures and Tables from this paper
One Citation
References
SHOWING 1-10 OF 55 REFERENCES
On the infinite width limit of neural networks with a standard parameterization
- Computer Science, Mathematics
- ArXiv
- 2020
- 8
- PDF
Implicit Bias of Gradient Descent for Wide Two-layer Neural Networks Trained with the Logistic Loss
- Computer Science, Mathematics
- COLT
- 2020
- 61
- PDF
Scaling Limits of Wide Neural Networks with Weight Sharing: Gaussian Process Behavior, Gradient Independence, and Neural Tangent Kernel Derivation
- Computer Science, Physics
- ArXiv
- 2019
- 122
- PDF
A Convergence Theory for Deep Learning via Over-Parameterization
- Computer Science, Mathematics
- ICML
- 2019
- 484
- PDF
Tensor Programs II: Neural Tangent Kernel for Any Architecture
- Computer Science, Mathematics
- ArXiv
- 2020
- 12
- PDF
Dynamically Stable Infinite-Width Limits of Neural Classifiers
- Computer Science, Mathematics
- ArXiv
- 2020
- 1
- PDF
Resurrecting the sigmoid in deep learning through dynamical isometry: theory and practice
- Computer Science, Mathematics
- NIPS
- 2017
- 131
- PDF
Dynamics of Deep Neural Networks and Neural Tangent Hierarchy
- Computer Science, Mathematics
- ICML
- 2020
- 38
- PDF
Modeling from Features: a Mean-field Framework for Over-parameterized Deep Neural Networks
- Computer Science, Mathematics
- ArXiv
- 2020
- 5
- PDF