# Spectral Bias in Practice: The Role of Function Frequency in Generalization

Despite their ability to represent highly expressive functions, deep learning models seem to find simple solutions that generalize surprisingly well. Spectral bias – the tendency of neural networks to prioritize learning low frequency functions – is one possible explanation for this phenomenon, but so far spectral bias has primarily been observed in theoretical models and simplified experiments. In this work, we propose methodologies for measuring spectral bias in modern image classification…

On the Spectral Bias of Neural Networks

- Computer ScienceICML
- 2019

This work shows that deep ReLU networks are biased towards low frequency functions, and studies the robustness of the frequency components with respect to parameter perturbation, to develop the intuition that the parameters must be finely tuned to express high frequency functions.

Training behavior of deep neural network in frequency domain

- Computer ScienceICONIP
- 2019

For both real and synthetic datasets, it is empirically found that a DNN with common settings first quickly captures the dominant low-frequency components, and then relatively slowly captures the high-frequency ones.

Frequency Bias in Neural Networks for Input of Non-Uniform Density

- Computer ScienceICML
- 2020

The Neural Tangent Kernel model is used to explore the effect of variable density on training dynamics and convergence results for deep, fully connected networks with respect to the spectral decomposition of the NTK are proved.

The Convergence Rate of Neural Networks for Learned Functions of Different Frequencies

- Computer ScienceNeurIPS
- 2019

It is shown theoretically and experimentally that a shallow neural network without bias cannot represent or learn simple, low frequency functions with odd frequencies and specific predictions of the time it will take a network to learn functions of varying frequency are led.

Understanding deep learning requires rethinking generalization

- Computer ScienceICLR
- 2017

These experiments establish that state-of-the-art convolutional networks for image classification trained with stochastic gradient methods easily fit a random labeling of the training data, and confirm that simple depth two neural networks already have perfect finite sample expressivity.

SGD on Neural Networks Learns Functions of Increasing Complexity

- Computer ScienceNeurIPS
- 2019

Key to the work is a new measure of how well one classifier explains the performance of another, based on conditional mutual information, which can be helpful in explaining why SGD-learned classifiers tend to generalize well even in the over-parameterized regime.

To understand deep learning we need to understand kernel learning

- Computer ScienceICML
- 2018

It is argued that progress on understanding deep learning will be difficult until more tractable "shallow" kernel methods are better understood, and a need for new theoretical ideas for understanding properties of classical kernel methods.

Intriguing properties of neural networks

- Computer ScienceICLR
- 2014

It is found that there is no distinction between individual highlevel units and random linear combinations of high level units, according to various methods of unit analysis, and it is suggested that it is the space, rather than the individual units, that contains of the semantic information in the high layers of neural networks.

Generalization Guarantees for Neural Networks via Harnessing the Low-rank Structure of the Jacobian

- Computer ScienceArXiv
- 2019

A data-dependent optimization and generalization theory which leverages the low-rank structure of the Jacobian matrix associated with the network and shows that even constant width neural nets can provably generalize for sufficiently nice datasets.

Train longer, generalize better: closing the generalization gap in large batch training of neural networks

- Computer ScienceNIPS
- 2017

This work proposes a "random walk on random landscape" statistical model which is known to exhibit similar "ultra-slow" diffusion behavior and presents a novel algorithm named "Ghost Batch Normalization" which enables significant decrease in the generalization gap without increasing the number of updates.