• Corpus ID: 238407996

Spectral Bias in Practice: The Role of Function Frequency in Generalization

@article{FridovichKeil2021SpectralBI,
  title={Spectral Bias in Practice: The Role of Function Frequency in Generalization},
  author={Sara Fridovich-Keil and Raphael Gontijo Lopes and Rebecca Roelofs},
  journal={ArXiv},
  year={2021},
  volume={abs/2110.02424}
}
Despite their ability to represent highly expressive functions, deep learning models trained with SGD seem to find simple, constrained solutions that generalize surprisingly well. Spectral bias – the tendency of neural networks to prioritize learning low frequency functions – is one possible explanation for this phenomenon, but so far spectral bias has only been observed in theoretical models and simplified experiments. In this work, we propose methodologies for measuring spectral bias in… 

References

SHOWING 1-10 OF 40 REFERENCES
On the Spectral Bias of Neural Networks
TLDR
This work shows that deep ReLU networks are biased towards low frequency functions, and studies the robustness of the frequency components with respect to parameter perturbation, to develop the intuition that the parameters must be finely tuned to express high frequency functions.
The Convergence Rate of Neural Networks for Learned Functions of Different Frequencies
TLDR
It is shown theoretically and experimentally that a shallow neural network without bias cannot represent or learn simple, low frequency functions with odd frequencies and specific predictions of the time it will take a network to learn functions of varying frequency are led.
Frequency Bias in Neural Networks for Input of Non-Uniform Density
TLDR
The Neural Tangent Kernel model is used to explore the effect of variable density on training dynamics and convergence results for deep, fully connected networks with respect to the spectral decomposition of the NTK are proved.
Understanding deep learning requires rethinking generalization
TLDR
These experiments establish that state-of-the-art convolutional networks for image classification trained with stochastic gradient methods easily fit a random labeling of the training data, and confirm that simple depth two neural networks already have perfect finite sample expressivity.
SGD on Neural Networks Learns Functions of Increasing Complexity
TLDR
Key to the work is a new measure of how well one classifier explains the performance of another, based on conditional mutual information, which can be helpful in explaining why SGD-learned classifiers tend to generalize well even in the over-parameterized regime.
To understand deep learning we need to understand kernel learning
TLDR
It is argued that progress on understanding deep learning will be difficult until more tractable "shallow" kernel methods are better understood, and a need for new theoretical ideas for understanding properties of classical kernel methods.
Intriguing properties of neural networks
TLDR
It is found that there is no distinction between individual highlevel units and random linear combinations of high level units, according to various methods of unit analysis, and it is suggested that it is the space, rather than the individual units, that contains of the semantic information in the high layers of neural networks.
Train longer, generalize better: closing the generalization gap in large batch training of neural networks
TLDR
This work proposes a "random walk on random landscape" statistical model which is known to exhibit similar "ultra-slow" diffusion behavior and presents a novel algorithm named "Ghost Batch Normalization" which enables significant decrease in the generalization gap without increasing the number of updates.
Towards Understanding the Role of Over-Parametrization in Generalization of Neural Networks
TLDR
A novel complexity measure based on unit-wise capacities resulting in a tighter generalization bound for two layer ReLU networks and a matching lower bound for the Rademacher complexity that improves over previous capacity lower bounds for neural networks are presented.
Train faster, generalize better: Stability of stochastic gradient descent
We show that parametric models trained by a stochastic gradient method (SGM) with few iterations have vanishing generalization error. We prove our results by arguing that SGM is algorithmically
...
1
2
3
4
...