• Corpus ID: 238407996

Spectral Bias in Practice: The Role of Function Frequency in Generalization

  title={Spectral Bias in Practice: The Role of Function Frequency in Generalization},
  author={Sara Fridovich-Keil and Raphael Gontijo Lopes and Rebecca Roelofs},
Despite their ability to represent highly expressive functions, deep learning models seem to find simple solutions that generalize surprisingly well. Spectral bias – the tendency of neural networks to prioritize learning low frequency functions – is one possible explanation for this phenomenon, but so far spectral bias has primarily been observed in theoretical models and simplified experiments. In this work, we propose methodologies for measuring spectral bias in modern image classification… 


On the Spectral Bias of Neural Networks
This work shows that deep ReLU networks are biased towards low frequency functions, and studies the robustness of the frequency components with respect to parameter perturbation, to develop the intuition that the parameters must be finely tuned to express high frequency functions.
Training behavior of deep neural network in frequency domain
For both real and synthetic datasets, it is empirically found that a DNN with common settings first quickly captures the dominant low-frequency components, and then relatively slowly captures the high-frequency ones.
Frequency Bias in Neural Networks for Input of Non-Uniform Density
The Neural Tangent Kernel model is used to explore the effect of variable density on training dynamics and convergence results for deep, fully connected networks with respect to the spectral decomposition of the NTK are proved.
The Convergence Rate of Neural Networks for Learned Functions of Different Frequencies
It is shown theoretically and experimentally that a shallow neural network without bias cannot represent or learn simple, low frequency functions with odd frequencies and specific predictions of the time it will take a network to learn functions of varying frequency are led.
Understanding deep learning requires rethinking generalization
These experiments establish that state-of-the-art convolutional networks for image classification trained with stochastic gradient methods easily fit a random labeling of the training data, and confirm that simple depth two neural networks already have perfect finite sample expressivity.
SGD on Neural Networks Learns Functions of Increasing Complexity
Key to the work is a new measure of how well one classifier explains the performance of another, based on conditional mutual information, which can be helpful in explaining why SGD-learned classifiers tend to generalize well even in the over-parameterized regime.
To understand deep learning we need to understand kernel learning
It is argued that progress on understanding deep learning will be difficult until more tractable "shallow" kernel methods are better understood, and a need for new theoretical ideas for understanding properties of classical kernel methods.
Intriguing properties of neural networks
It is found that there is no distinction between individual highlevel units and random linear combinations of high level units, according to various methods of unit analysis, and it is suggested that it is the space, rather than the individual units, that contains of the semantic information in the high layers of neural networks.
Generalization Guarantees for Neural Networks via Harnessing the Low-rank Structure of the Jacobian
A data-dependent optimization and generalization theory which leverages the low-rank structure of the Jacobian matrix associated with the network and shows that even constant width neural nets can provably generalize for sufficiently nice datasets.
Train longer, generalize better: closing the generalization gap in large batch training of neural networks
This work proposes a "random walk on random landscape" statistical model which is known to exhibit similar "ultra-slow" diffusion behavior and presents a novel algorithm named "Ghost Batch Normalization" which enables significant decrease in the generalization gap without increasing the number of updates.