Heavy-Tailed Universality Predicts Trends in Test Accuracies for Very Large Pre-Trained Deep Neural Networks

@article{Martin2020HeavyTailedUP,
  title={Heavy-Tailed Universality Predicts Trends in Test Accuracies for Very Large Pre-Trained Deep Neural Networks},
  author={C. Martin and Michael W. Mahoney},
  journal={ArXiv},
  year={2020},
  volume={abs/1901.08278}
}
Given two or more Deep Neural Networks (DNNs) with the same or similar architectures, and trained on the same dataset, but trained with different solvers, parameters, hyper-parameters, regularization, etc., can we predict which DNN will have the best test accuracy, and can we do so without peeking at the test data? In this paper, we show how to use a new Theory of Heavy-Tailed Self-Regularization (HT-SR) to answer this. HT-SR suggests, among other things, that modern DNNs exhibit what we call… Expand
13 Citations
Predicting trends in the quality of state-of-the-art neural networks without access to training or testing data
  • 2
  • PDF
Multiplicative noise and heavy tails in stochastic optimization
  • 9
  • PDF
Statistical Mechanics Methods for Discovering Knowledge from Modern Production Quality Neural Networks
  • 2
  • PDF
Machine learning identifies scale-free properties in disordered materials
  • 1
  • PDF
Eigenvector Statistics of L\'{e}vy Matrices
  • 1
  • PDF
GOE Statistics for Levy Matrices
  • 13
  • PDF
...
1
2
...

References

SHOWING 1-10 OF 48 REFERENCES
Traditional and Heavy-Tailed Self Regularization in Neural Network Models
  • 38
  • PDF
A Surprising Linear Relationship Predicts Test Performance in Deep Networks
  • 25
  • Highly Influential
  • PDF
Implicit Self-Regularization in Deep Neural Networks: Evidence from Random Matrix Theory and Implications for Learning
  • 49
  • PDF
Theory IIIb: Generalization in Deep Networks
  • 17
  • PDF
Rethinking generalization requires revisiting old ideas: statistical mechanics approaches and complex learning behavior
  • 36
  • PDF
Learning with Spectral Kernels and Heavy-Tailed Data
  • 3
  • PDF
Spectrally-normalized margin bounds for neural networks
  • 544
  • PDF
The Implicit Bias of Gradient Descent on Separable Data
  • 343
  • PDF
The jamming transition as a paradigm to understand the loss landscape of deep neural networks
  • 67
  • PDF
Stronger generalization bounds for deep nets via a compression approach
  • 310
  • PDF
...
1
2
3
4
5
...