# Redundant representations help generalization in wide neural networks

@inproceedings{Doimo2021RedundantRH, title={Redundant representations help generalization in wide neural networks}, author={Diego Doimo and Aldo Glielmo and Sebastian Goldt and Alessandro Laio}, year={2021} }

Deep neural networks (DNNs) defy the classical bias-variance trade-off: adding parameters to a DNN that interpolates its training data will typically improve its generalization performance. Explaining the mechanism behind this “benign overﬁtting” in deep networks remains an outstanding challenge. Here, we study the last hidden layer representations of various state-of-the-art convolutional neural networks and ﬁnd that if the last hidden representation is wide enough, its neurons tend to split…

## References

SHOWING 1-10 OF 53 REFERENCES

### Understanding deep learning requires rethinking generalization

- Computer ScienceICLR
- 2017

These experiments establish that state-of-the-art convolutional networks for image classification trained with stochastic gradient methods easily fit a random labeling of the training data, and confirm that simple depth two neural networks already have perfect finite sample expressivity.

### Intrinsic dimension of data representations in deep neural networks

- Computer ScienceNeurIPS
- 2019

The intrinsic dimensionality of data-representations is studied, i.e. the minimal number of parameters needed to describe a representation, and it is found that, in a trained network, the ID is orders of magnitude smaller than the number of units in each layer.

### A Closer Look at Memorization in Deep Networks

- Computer ScienceICML
- 2017

The analysis suggests that the notions of effective capacity which are dataset independent are unlikely to explain the generalization performance of deep networks when trained with gradient based methods because training data itself plays an important role in determining the degree of memorization.

### Scaling description of generalization with number of parameters in deep learning

- Computer ScienceJournal of Statistical Mechanics: Theory and Experiment
- 2020

This work relies on the so-called Neural Tangent Kernel, which connects large neural nets to kernel methods, to show that the initialization causes finite-size random fluctuations of the neural net output function f N around its expectation, which affects the generalization error for classification.

### Implicit Bias of Gradient Descent for Wide Two-layer Neural Networks Trained with the Logistic Loss

- Computer ScienceCOLT
- 2020

It is shown that the limits of the gradient flow on exponentially tailed losses can be fully characterized as a max-margin classifier in a certain non-Hilbertian space of functions.

### The Implicit Bias of Gradient Descent on Separable Data

- Computer ScienceJ. Mach. Learn. Res.
- 2018

We examine gradient descent on unregularized logistic regression problems, with homogeneous linear predictors on linearly separable datasets. We show the predictor converges to the direction of the…

### Improving neural networks by preventing co-adaptation of feature detectors

- Computer ScienceArXiv
- 2012

When a large feedforward neural network is trained on a small training set, it typically performs poorly on held-out test data. This "overfitting" is greatly reduced by randomly omitting half of the…

### Wide Residual Networks

- Computer ScienceBMVC
- 2016

This paper conducts a detailed experimental study on the architecture of ResNet blocks and proposes a novel architecture where the depth and width of residual networks are decreased and the resulting network structures are called wide residual networks (WRNs), which are far superior over their commonly used thin and very deep counterparts.

### On Connectivity of Solutions in Deep Learning: The Role of Over-parameterization and Feature Quality

- Computer ScienceArXiv
- 2021

This paper presents a novel condition for ensuring the connectivity of two arbitrary points in parameter space and shows that if subsets of features at each layer are linearly separable, then almost no over-parameterization is needed.

### Neural Networks and the Bias/Variance Dilemma

- Computer Science, PsychologyNeural Computation
- 1992

It is suggested that current-generation feedforward neural networks are largely inadequate for difficult problems in machine perception and machine learning, regardless of parallel-versus-serial hardware or other implementation issues.