Corpus ID: 211989398

Rethinking Parameter Counting in Deep Models: Effective Dimensionality Revisited

@article{Maddox2020RethinkingPC,
  title={Rethinking Parameter Counting in Deep Models: Effective Dimensionality Revisited},
  author={W. Maddox and Gregory W. Benton and A. Wilson},
  journal={ArXiv},
  year={2020},
  volume={abs/2003.02139}
}
Neural networks appear to have mysterious generalization properties when using parameter counting as a proxy for complexity. Indeed, neural networks often have many more parameters than there are data points, yet still provide good generalization performance. Moreover, when we measure generalization as a function of parameters, we see double descent behaviour, where the test error decreases, increases, and then again decreases. We show that many of these properties become understandable when… Expand
13 Citations
GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding
  • 60
  • PDF
Generalization bounds for deep learning
  • 3
  • Highly Influenced
  • PDF
Loss Surface Simplexes for Mode Connecting Volumes and Fast Ensembling
  • PDF
Why Flatness Correlates With Generalization For Deep Neural Networks
  • PDF
Catastrophic Fisher Explosion: Early Phase Fisher Matrix Impacts Generalization
  • Stanislaw Jastrzebski, D. Arpit, +6 authors K. Geras
  • Computer Science, Mathematics
  • ArXiv
  • 2020
  • 1
  • PDF
Design of Physical Experiments via Collision-Free Latent Space Optimization
  • PDF
...
1
2
...

References

SHOWING 1-10 OF 68 REFERENCES
Deep Double Descent: Where Bigger Models and More Data Hurt
  • 168
  • PDF
Understanding deep learning requires rethinking generalization
  • 2,596
  • PDF
Measuring the Intrinsic Dimension of Objective Landscapes
  • 113
  • PDF
Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent
  • 316
  • PDF
Bayesian Deep Learning and a Probabilistic Perspective of Generalization
  • 53
  • PDF
Computing Nonvacuous Generalization Bounds for Deep (Stochastic) Neural Networks with Many More Parameters than Training Data
  • 321
  • PDF
Emergence of Invariance and Disentanglement in Deep Representations
  • 221
  • PDF
Benign overfitting in linear regression
  • 149
  • Highly Influential
  • PDF
...
1
2
3
4
5
...