# Principal Components Bias in Deep Neural Networks

@inproceedings{Hacohen2021PrincipalCB, title={Principal Components Bias in Deep Neural Networks}, author={Guy Hacohen and Daphna Weinshall}, year={2021} }

Recent work suggests that convolutional neural networks of different architectures learn to classify images in the same order. To understand this phenomenon, we revisit the over-parametrized deep linear network model. Our asymptotic analysis, assuming that the hidden layers are wide enough, reveals that the convergence rate of this model’s parameters is exponentially faster along directions corresponding to the larger principal components of the data, at a rate governed by the singular values… Expand

#### Figures and Tables from this paper

#### One Citation

The Grammar-Learning Trajectories of Neural Language Models

- Computer Science
- ArXiv
- 2021

The learning trajectories of linguistic phenomena provide insight into the nature of linguistic representation, beyond what can be gleaned from inspecting the behavior of an adult speaker. To apply a… Expand

#### References

SHOWING 1-10 OF 56 REFERENCES

Towards Understanding the Generalization Bias of Two Layer Convolutional Linear Classifiers with Gradient Descent

- Computer Science, Mathematics
- AISTATS
- 2019

A general analysis of the generalization performance as a function of data distribution and convolutional filter size is provided, given gradient descent as the optimization algorithm, and the results are interpreted using concrete examples. Expand

On the Spectral Bias of Neural Networks

- Computer Science, Mathematics
- ICML
- 2019

This work shows that deep ReLU networks are biased towards low frequency functions, and studies the robustness of the frequency components with respect to parameter perturbation, to develop the intuition that the parameters must be finely tuned to express high frequency functions. Expand

The Implicit Bias of Depth: How Incremental Learning Drives Generalization

- Computer Science, Mathematics
- ICLR
- 2020

The notion of incremental learning dynamics is defined and the conditions on depth and initialization for which this phenomenon arises in deep linear models are derived, proving that while shallow models can exhibit incrementallearning dynamics, they require the initialization to be exponentially small for these dynamics to present themselves. Expand

Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers

- Computer Science, Mathematics
- NeurIPS
- 2019

It is proved that overparameterized neural networks can learn some notable concept classes, including two and three-layer networks with fewer parameters and smooth activations, and SGD (stochastic gradient descent) or its variants in polynomial time using polynomially many samples. Expand

SGD on Neural Networks Learns Functions of Increasing Complexity

- Computer Science, Mathematics
- NeurIPS
- 2019

Key to the work is a new measure of how well one classifier explains the performance of another, based on conditional mutual information, which can be helpful in explaining why SGD-learned classifiers tend to generalize well even in the over-parameterized regime. Expand

Understanding deep learning requires rethinking generalization

- Computer Science
- ICLR
- 2017

These experiments establish that state-of-the-art convolutional networks for image classification trained with stochastic gradient methods easily fit a random labeling of the training data, and confirm that simple depth two neural networks already have perfect finite sample expressivity. Expand

Analysis of feature learning in weight-tied autoencoders via the mean field lens

- Computer Science, Physics
- ArXiv
- 2021

A new argument is proved which proves that the required number of neurons for autoencoder models is only polynomial in data dimension d, and conjecture that N is necessarily larger than a data-dependent intrinsic dimension, a behavior that is fundamentally different from previously studied setups. Expand

The Surprising Simplicity of the Early-Time Learning Dynamics of Neural Networks

- Computer Science, Mathematics
- NeurIPS
- 2020

It is formally proved that, for a class of well-behaved input distributions, the early-time learning dynamics of a two-layer fully-connected neural network can be mimicked by training a simple linear model on the inputs. Expand

Implicit Regularization of Discrete Gradient Dynamics in Deep Linear Neural Networks

- Computer Science, Mathematics
- NeurIPS
- 2019

This work studies the discrete gradient dynamics of the training of a two-layer linear network with the least-squares loss using a time rescaling to show that this dynamics sequentially learns the solutions of a reduced-rank regression with a gradually increasing rank. Expand

The Convergence Rate of Neural Networks for Learned Functions of Different Frequencies

- Computer Science, Engineering
- NeurIPS
- 2019

It is shown theoretically and experimentally that a shallow neural network without bias cannot represent or learn simple, low frequency functions with odd frequencies and specific predictions of the time it will take a network to learn functions of varying frequency are led. Expand