Compressing Heavy-Tailed Weight Matrices for Non-Vacuous Generalization Bounds
@article{Shin2021CompressingHW, title={Compressing Heavy-Tailed Weight Matrices for Non-Vacuous Generalization Bounds}, author={John Y. Shin}, journal={ArXiv}, year={2021}, volume={abs/2105.11025} }
Heavy-tailed distributions have been studied in statistics, random matrix theory, physics, and econometrics as models of correlated systems, among other domains. Further, heavy-tail distributed eigenvalues of the covariance matrix of the weight matrices in neural networks have been shown to empirically correlate with test set accuracy in several works (e.g. [1]), but a formal relationship between heavy-tail distributed parameters and generalization bounds was yet to be demonstrated. In this…
2 Citations
Heavy Tails in SGD and Compressibility of Overparametrized Neural Networks
- Computer ScienceNeurIPS
- 2021
This study links compressibility to two recently established properties of SGD, and proves that the networks are guaranteed to be ‘`p-compressible’, and the compression errors of different pruning techniques become arbitrarily small as the network size increases.
Deep neural networks with dependent weights: Gaussian Process mixture limit, heavy tails, sparsity and compressibility
- Computer ScienceArXiv
- 2022
The infinite-width limit of deep feedforward neural networks whose weights are dependent, and modelled via a mixture of Gaussian distributions is studied, and it is shown that, in this regime, the weights are compressible, and feature learning is possible.
References
SHOWING 1-10 OF 36 REFERENCES
Spectral Properties of Heavy-Tailed Random Matrices
- Mathematics
- 2018
The classical Random Matrix Theory studies asymptotic spectral properties of random matrices when their dimensions grow to infinity. In contrast, the non-asymptotic branch of the theory is focused on…
Traditional and Heavy-Tailed Self Regularization in Neural Network Models
- Computer ScienceICML
- 2019
A novel form of Heavy-Tailed Self-Regularization is identified, similar to the self-organization seen in the statistical physics of disordered systems, which can depend strongly on the many knobs of the training process.
Heavy-Tailed Universality Predicts Trends in Test Accuracies for Very Large Pre-Trained Deep Neural Networks
- Computer ScienceSDM
- 2020
A new Theory of Heavy-Tailed Self-Regularization (HT-SR) is used to develop a Universal capacity control metric that is a weighted average of PL exponents, and it correlates very well with the reported test accuracies of these DNNs.
The Heavy-Tail Phenomenon in SGD
- Computer ScienceICML
- 2021
It is claimed that depending on the structure of the Hessian of the loss at the minimum, and the choices of the algorithm parameters $\eta$ and $b$, the SGD iterates will converge to a stationary distribution, and these results are the first of their kind to rigorously characterize the empirically observed heavy-tailed behavior of SGD.
Multiplicative noise and heavy tails in stochastic optimization
- Computer ScienceICML
- 2021
Modelling stochastic optimization algorithms as discrete random recurrence relations, it is shown that multiplicative noise, as it commonly arises due to variance in local rates of convergence, results in heavy-tailed stationary behaviour in the parameters.
On the top eigenvalue of heavy-tailed random matrices
- Mathematics
- 2007
We study the statistics of the largest eigenvalue λmax of N × N random matrices with IID entries of variance 1/N, but with power law tails P(Mij) ∼ |Mij|−1−μ. When μ > 4, λmax converges to 2 with…
On the Heavy-Tailed Theory of Stochastic Gradient Descent for Deep Neural Networks
- Computer ScienceArXiv
- 2019
It is argued that the Gaussianity assumption might fail to hold in deep learning settings and hence render the Brownian motion-based analyses inappropriate and establish an explicit connection between the convergence rate of SGD to a local minimum and the tail-index $\alpha$.
Sharp Concentration Results for Heavy-Tailed Distributions
- MathematicsArXiv
- 2020
The main theorem can not only recover some of the existing results, such as the concentration of the sum of subWeibull random variables, but it can also produce new results for theSum of random variables with heavier tails, which are based on standard truncation arguments.
Stronger generalization bounds for deep nets via a compression approach
- Computer ScienceICML
- 2018
These results provide some theoretical justification for widespread empirical success in compressing deep nets and show generalization bounds that're orders of magnitude better in practice.
Level Statistics and Localization Transitions of Lévy Matrices.
- MathematicsPhysical review letters
- 2016
This work establishes the equation determining the localization transition and obtains the phase diagram, and shows that the eigenvalue statistics is the same one as of the Gaussian orthogonal ensemble in the whole delocalized phase and is Poisson-like in the localized phase.