Heavy-Tailed Universality Predicts Trends in Test Accuracies for Very Large Pre-Trained Deep Neural Networks

@article{Martin2020HeavyTailedUP,
  title={Heavy-Tailed Universality Predicts Trends in Test Accuracies for Very Large Pre-Trained Deep Neural Networks},
  author={C. Martin and Michael W. Mahoney},
  journal={ArXiv},
  year={2020},
  volume={abs/1901.08278}
}
  • C. Martin, Michael W. Mahoney
  • Published 2020
  • Computer Science, Mathematics
  • ArXiv
  • Given two or more Deep Neural Networks (DNNs) with the same or similar architectures, and trained on the same dataset, but trained with different solvers, parameters, hyper-parameters, regularization, etc., can we predict which DNN will have the best test accuracy, and can we do so without peeking at the test data? In this paper, we show how to use a new Theory of Heavy-Tailed Self-Regularization (HT-SR) to answer this. HT-SR suggests, among other things, that modern DNNs exhibit what we call… CONTINUE READING
    13 Citations
    Predicting trends in the quality of state-of-the-art neural networks without access to training or testing data
    • 2
    • PDF
    Multiplicative noise and heavy tails in stochastic optimization
    • 8
    • PDF
    Statistical Mechanics Methods for Discovering Knowledge from Modern Production Quality Neural Networks
    • 2
    • PDF
    Sparse Quantized Spectral Clustering
    • 1
    • PDF
    Machine learning identifies scale-free properties in disordered materials
    • 1
    • PDF
    Eigenvector Statistics of L\'{e}vy Matrices
    • 1
    • PDF

    References

    SHOWING 1-10 OF 48 REFERENCES
    Traditional and Heavy-Tailed Self Regularization in Neural Network Models
    • 35
    • PDF
    A Surprising Linear Relationship Predicts Test Performance in Deep Networks
    • 25
    • Highly Influential
    • PDF
    Implicit Self-Regularization in Deep Neural Networks: Evidence from Random Matrix Theory and Implications for Learning
    • 48
    • PDF
    Theory IIIb: Generalization in Deep Networks
    • 17
    • PDF
    Rethinking generalization requires revisiting old ideas: statistical mechanics approaches and complex learning behavior
    • 35
    • PDF
    Learning with Spectral Kernels and Heavy-Tailed Data
    • 3
    • PDF
    Spectrally-normalized margin bounds for neural networks
    • 530
    • PDF
    The Implicit Bias of Gradient Descent on Separable Data
    • 336
    • PDF
    The jamming transition as a paradigm to understand the loss landscape of deep neural networks
    • 65
    • PDF
    Stronger generalization bounds for deep nets via a compression approach
    • 308
    • PDF