Random matrix analysis of deep neural network weight matrices

@article{Thamm2022RandomMA,
title={Random matrix analysis of deep neural network weight matrices},
author={Matthias Thamm and Max Staats and Bernd Rosenow},
journal={ArXiv},
year={2022},
volume={abs/2203.14661}
}
• Published 28 March 2022
• Computer Science
• ArXiv
Neural networks have been used successfully in a variety of ﬁelds, which has led to a great deal of interest in developing a theoretical understanding of how they store the information needed to perform a particular task. We study the weight matrices of trained deep neural networks using methods from random matrix theory (RMT) and show that the statistics of most of the singular values follow universal RMT predictions. This suggests that they are random and do not contain system speciﬁc…
2 Citations

Figures and Tables from this paper

Boundary between noise and information applied to filtering neural network weight matrices
• Computer Science
ArXiv
• 2022
An algorithm is introduced, which both removes small singular values and reduces the magnitude of large singular values to counteract the effect of level repulsion between the noise and the information part of the spectrum.
Universal characteristics of deep neural network loss surfaces from random matrix theory
• Computer Science
ArXiv
• 2022
This paper considers several aspects of random matrix universality in deep neural networks. Motivated by recent experimental work, we use universal properties of random matrices related to local

References

SHOWING 1-10 OF 89 REFERENCES
Beyond Random Matrix Theory for Deep Networks
This work investigates whether the Wigner semi-circle and Marcenko-Pastur distributions, often used for deep neural network theoretical analysis, match empirically observed spectral densities, and considers two new classes of matrix ensembles; random Wigninger/Wishart ensemble products and percolated WignER/Wigner ensemble, both of which better match observed spectra.
Implicit Self-Regularization in Deep Neural Networks: Evidence from Random Matrix Theory and Implications for Learning
• Computer Science
J. Mach. Learn. Res.
• 2021
A theory to identify 5+1 Phases of Training, corresponding to increasing amounts of Implicit Self-Regularization, which demonstrates that DNN optimization with larger batch sizes leads to less-well implicitly-regularized models, and it provides an explanation for the generalization gap phenomena.
The Emergence of Spectral Universality in Deep Networks
• Computer Science
AISTATS
• 2018
This work uses powerful tools from free probability theory to provide a detailed analytic understanding of how a deep network's Jacobian spectrum depends on various hyperparameters including the nonlinearity, the weight and bias distributions, and the depth.
Understanding the difficulty of training deep feedforward neural networks
• Computer Science
AISTATS
• 2010
The objective here is to understand better why standard gradient descent from random initialization is doing so poorly with deep neural networks, to better understand these recent relative successes and help design better algorithms in the future.
Understanding deep learning (still) requires rethinking generalization
• Computer Science
Commun. ACM
• 2021
These experiments establish that state-of-the-art convolutional networks for image classification trained with stochastic gradient methods easily fit a random labeling of the training data, and corroborate these experimental findings with a theoretical construction showing that simple depth two neural networks already have perfect finite sample expressivity.
Deep learning generalizes because the parameter-function map is biased towards simple functions
• Computer Science
ICLR
• 2019
This paper argues that the parameter-function map of many DNNs should be exponentially biased towards simple functions, and provides clear evidence for this strong simplicity bias in a model DNN for Boolean functions, as well as in much larger fully connected and convolutional networks applied to CIFAR10 and MNIST.
Predicting trends in the quality of state-of-the-art neural networks without access to training or testing data
• Computer Science
Nature communications
• 2021
The techniques can be used to identify when a pretrained DNN has problems that can not be detected simply by examining training/test accuracies, and it is shown how poorly-trained (and/or poorly fine-tuned) models may exhibit both Scale Collapse and unusually large PL exponents, in particular for recent NLP models.
A Random Matrix Approach to Neural Networks
• Computer Science, Mathematics
ArXiv
• 2017
It is proved that, as $n,p,T$ grow large at the same rate, the resolvent $Q=(G+\gamma I_T)^{-1}$, for $\gamma>0$ has a similar behavior as that met in sample covariance matrix models, which enables the estimation of the asymptotic performance of single-layer random neural networks.
Qualitatively characterizing neural network optimization problems
• Computer Science
ICLR
• 2015
A simple analysis technique is introduced to look for evidence that state-of-the-art neural networks are overcoming local optima, and finds that, on a straight path from initialization to solution, a variety of state of the art neural networks never encounter any significant obstacles.