# Nonlinear random matrix theory for deep learning

@article{Pennington2017NonlinearRM, title={Nonlinear random matrix theory for deep learning}, author={Jeffrey Pennington and Pratik Worah}, journal={Journal of Statistical Mechanics: Theory and Experiment}, year={2017}, volume={2019} }

Neural network configurations with random weights play an important role in the analysis of deep learning. They define the initial loss landscape and are closely related to kernel and random feature methods. Despite the fact that these networks are built out of random matrices, the vast and powerful machinery of random matrix theory has so far found limited success in studying them. A main obstacle in this direction is that neural networks are nonlinear, which prevents the straightforward…

## 136 Citations

### Eigenvalue Distribution of Large Random Matrices Arising in Deep Neural Networks: Orthogonal Case

- Computer ScienceJournal of Mathematical Physics
- 2022

This paper justifies the validity of the mean field approximation in the infinite width limit for the deep untrained neural networks and extends the macroscopic universality of random matrix theory to this new class of random matrices.

### Random matrix analysis of deep neural network weight matrices

- Computer ScienceArXiv
- 2022

The weight matrices of trained deep neural networks are studied using methods from random matrix theory (RMT) and it is shown that the statistics of most of the singular values follow universal RMT predictions, suggesting that they are random and do not contain system speciﬁc information.

### The Spectrum of the Fisher Information Matrix of a Single-Hidden-Layer Neural Network

- Computer ScienceNeurIPS
- 2018

This work extends a recently-developed framework for studying spectra of nonlinear random matrices to characterize an important measure of curvature, namely the eigenvalues of the Fisher information matrix and finds that linear networks suffer worse conditioning than nonlinear networks and that non linear networks are generically non-degenerate.

### Largest Eigenvalues of the Conjugate Kernel of Single-Layered Neural Networks

- Mathematics, Computer ScienceArXiv
- 2022

The asymptotic limit of the largest eigenvalue for the nonlinear model to that of an information plus noise random matrix, establishing a possible phase transition depending on the function f and the distribution of W and X.

### A Random Matrix Perspective on Mixtures of Nonlinearities for Deep Learning

- Computer ScienceArXiv
- 2019

Intriguingly, it is found that a mixture of nonlinearities can outperform the best single nonlinearity on the noisy autoecndoing task, suggesting that mixtures of non linearities might be useful for approximate kernel methods or neural network architecture design.

### Analysis of One-Hidden-Layer Neural Networks via the Resolvent Method

- Mathematics, Computer ScienceNeurIPS
- 2021

The Stieltjes transform of the limiting spectral distribution satisfies a quartic self-consistent equation up to some error terms, which is exactly the equation obtained by Pennington and Worah and Benigni and Péché with the moment method approach.

### Universal characteristics of deep neural network loss surfaces from random matrix theory

- Computer ScienceArXiv
- 2022

This paper considers several aspects of random matrix universality in deep neural networks. Motivated by recent experimental work, we use universal properties of random matrices related to local…

### On the Approximation Lower Bound for Neural Nets with Random Weights

- Computer ScienceArXiv
- 2020

It is shown that, despite the well-known fact that a shallow neural network is a universal approximator, a random net cannot achieve zero approximation error even for smooth functions, and it is proved that if the proposal distribution is compactly supported, then a lower bound is positive.

### A Random Matrix Perspective on Mixtures of Nonlinearities in High Dimensions

- Computer ScienceAISTATS
- 2022

This work analyzes the performance of random feature regression with features F = f ( WX + B ) for a random weight matrix W and bias vector B, obtaining exact formulae for the asymptotic training and test errors for data generated by a linear teacher model.

### Universal statistics of Fisher information in deep neural networks: mean field approach

- Computer ScienceAISTATS
- 2019

Novel statistics of FIM are revealed that are universal among a wide class of DNNs and can be connected to a norm-based capacity measure of generalization ability and quantitatively estimate an appropriately sized learning rate for gradient methods to converge.

## References

SHOWING 1-10 OF 24 REFERENCES

### A Correspondence Between Random Neural Networks and Statistical Field Theory

- Computer ScienceArXiv
- 2017

This work shows that the distribution of pre-activations in random neural networks can be exactly mapped onto lattice models in statistical physics, and argues that several previous investigations of stochastic networks actually studied a particular factorial approximation to the full lattice model.

### Exact solutions to the nonlinear dynamics of learning in deep linear neural networks

- Computer ScienceICLR
- 2014

It is shown that deep linear networks exhibit nonlinear learning phenomena similar to those seen in simulations of nonlinear networks, including long plateaus followed by rapid transitions to lower error solutions, and faster convergence from greedy unsupervised pretraining initial conditions than from random initial conditions.

### Resurrecting the sigmoid in deep learning through dynamical isometry: theory and practice

- Computer ScienceNIPS
- 2017

This work uses powerful tools from free probability theory to compute analytically the entire singular value distribution of a deep network's input-output Jacobian, and reveals that controlling the entire distribution of Jacobian singular values is an important design consideration in deep learning.

### Exponential expressivity in deep neural networks through transient chaos

- Computer ScienceNIPS
- 2016

The theoretical analysis of the expressive power of deep networks broadly applies to arbitrary nonlinearities, and provides a quantitative underpinning for previously abstract notions about the geometry of deep functions.

### The Loss Surfaces of Multilayer Networks

- Computer ScienceAISTATS
- 2015

It is proved that recovering the global minimum becomes harder as the network size increases and that it is in practice irrelevant as global minimum often leads to overfitting.

### The spectrum of kernel random matrices

- Computer Science, Mathematics
- 2010

Surprisingly, it is shown that in high-dimensions, and for the models the authors analyze, the problem becomes essentially linear—which is at odds with heuristics sometimes used to justify the usage of these methods.

### A Random Matrix Approach to Neural Networks

- Computer Science, MathematicsArXiv
- 2017

It is proved that, as $n,p,T$ grow large at the same rate, the resolvent $Q=(G+\gamma I_T)^{-1}$, for $\gamma>0$ has a similar behavior as that met in sample covariance matrix models, which enables the estimation of the asymptotic performance of single-layer random neural networks.

### Spectral density of products of Wishart dilute random matrices. Part I: the dense case

- Mathematics, Computer Science
- 2014

This work derives that the spectral density is a solution of a polynomial equation of degree $M+1$ and obtains exact expressions of it for $M=1, $2$ and $3$ and makes some observations for general $M$, based admittedly on some weak numerical evidence.

### Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

- Computer ScienceICML
- 2015

Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin.

### On the Expressive Power of Deep Neural Networks

- Computer ScienceICML
- 2017

We propose a new approach to the problem of neural network expressivity, which seeks to characterize how structural properties of a neural network family affect the functions it is able to compute.…