Corpus ID: 227229101

Feature Learning in Infinite-Width Neural Networks

@article{Yang2020FeatureLI,
  title={Feature Learning in Infinite-Width Neural Networks},
  author={Greg Yang and Edward J. Hu},
  journal={ArXiv},
  year={2020},
  volume={abs/2011.14522}
}
  • Greg Yang, Edward J. Hu
  • Published 2020
  • Computer Science, Physics
  • ArXiv
  • As its width tends to infinity, a deep neural network's behavior under gradient descent can become simplified and predictable (e.g. given by the Neural Tangent Kernel (NTK)), if it is parametrized appropriately (e.g. the NTK parametrization). However, we show that the standard and NTK parametrizations of a neural network do not admit infinite-width limits that can learn features, which is crucial for pretraining and transfer learning such as with BERT. We propose simple modifications to the… CONTINUE READING
    1 Citations

    Figures and Tables from this paper

    When Does Preconditioning Help or Hurt Generalization?
    • 4
    • PDF

    References

    SHOWING 1-10 OF 55 REFERENCES
    On the infinite width limit of neural networks with a standard parameterization
    • 8
    • PDF
    Implicit Bias of Gradient Descent for Wide Two-layer Neural Networks Trained with the Logistic Loss
    • 61
    • PDF
    A Convergence Theory for Deep Learning via Over-Parameterization
    • 484
    • PDF
    Tensor Programs II: Neural Tangent Kernel for Any Architecture
    • Greg Yang
    • Computer Science, Mathematics
    • ArXiv
    • 2020
    • 12
    • PDF
    Dynamically Stable Infinite-Width Limits of Neural Classifiers
    • 1
    • PDF
    Dynamics of Deep Neural Networks and Neural Tangent Hierarchy
    • 38
    • PDF
    A mean-field limit for certain deep neural networks
    • 26
    • Highly Influential
    • PDF