Universal characteristics of deep neural network loss surfaces from random matrix theory

  title={Universal characteristics of deep neural network loss surfaces from random matrix theory},
  author={Nicholas P Baskerville and Jonathan P. Keating and Francesco Mezzadri and Joseph Najnudel and Diego Granziol},
This paper considers several aspects of random matrix universality in deep neural networks. Motivated by recent experimental work, we use universal properties of random matrices related to local statistics to derive practical implications for deep neural networks based on a realistic model of their Hessians. In particular we derive universal aspects of outliers in the spectra of deep neural networks and demonstrate the important role of random matrix local laws in popular pre-conditioning… 

Figures from this paper


Nonlinear random matrix theory for deep learning
This work demonstrates that the pointwise nonlinearities typically applied in neural networks can be incorporated into a standard method of proof in random matrix theory known as the moments method, and identifies an intriguing new class of activation functions with favorable properties.
On Random Matrices Arising in Deep Neural Networks: General I.I.D. Case
  • L. Pastur, V. Slavin
  • Computer Science, Mathematics
    Random Matrices: Theory and Applications
  • 2022
This paper generalizes the results of [22] to the case where the entries of the synaptic weight matrices are just independent identically distributed random variables with zero mean and finite fourth moment, and extends the property of the so-called macroscopic universality on the considered random matrices.
Appearance of random matrix theory in deep learning
Random matrix analysis of deep neural network weight matrices
The weight matrices of trained deep neural networks are studied using methods from random matrix theory (RMT) and it is shown that the statistics of most of the singular values follow universal RMT predictions, suggesting that they are random and do not contain system specific information.
On Random Matrices Arising in Deep Neural Networks. Gaussian Case
The paper deals with distribution of singular values of product of random matrices arising in the analysis of deep neural networks by using a version of the standard techniques of random matrix theory under the assumption that the entries of data matrices are independent Gaussian random variables.
The loss surfaces of neural networks with general activation functions
A new path through the spin glass complexity calculations is charted using supersymmetric methods in random matrix theory which may prove useful in other contexts.
Beyond Random Matrix Theory for Deep Networks
This work investigates whether the Wigner semi-circle and Marcenko-Pastur distributions, often used for deep neural network theoretical analysis, match empirically observed spectral densities, and considers two new classes of matrix ensembles; random Wigninger/Wishart ensemble products and percolated WignER/Wigner ensemble, both of which better match observed spectra.
The Emergence of Spectral Universality in Deep Networks
This work uses powerful tools from free probability theory to provide a detailed analytic understanding of how a deep network's Jacobian spectrum depends on various hyperparameters including the nonlinearity, the weight and bias distributions, and the depth.
The Loss Surfaces of Multilayer Networks
It is proved that recovering the global minimum becomes harder as the network size increases and that it is in practice irrelevant as global minimum often leads to overfitting.
A random matrix theory approach to damping in deep learning
A novel random matrix theory based damping learner for second order optimisers inspired by linear shrinkage estimation is developed, and it is demonstrated that the derived method works well with adaptive gradient methods such as Adam.