Periodic Spectral Ergodicity: A Complexity Measure for Deep Neural Networks and Neural Architecture Search
@article{Szen2019PeriodicSE, title={Periodic Spectral Ergodicity: A Complexity Measure for Deep Neural Networks and Neural Architecture Search}, author={M. S{\"u}zen and Joan J. Cerd{\`a} and Cornelius Weber}, journal={ArXiv}, year={2019}, volume={abs/1911.07831} }
Establishing associations between the structure and the generalisation ability of deep neural networks (DNNs) is a challenging task in modern machine learning. Producing solutions to this challenge will bring progress both in the theoretical understanding of DNNs and in building new architectures efficiently. In this work, we address this challenge by developing a new complexity measure based on the concept of {Periodic Spectral Ergodicity} (PSE) originating from quantum statistical mechanics…
One Citation
Automated Deep Learning: Neural Architecture Search Is Not the End
- Computer ScienceArXiv
- 2021
There are three types of optimization routines typically employed in AutoML/AutoDL: population-based, distribution-based and Bayesian Optimization, which operate on sets of configurations named populations.
References
SHOWING 1-10 OF 27 REFERENCES
Resurrecting the sigmoid in deep learning through dynamical isometry: theory and practice
- Computer ScienceNIPS
- 2017
This work uses powerful tools from free probability theory to compute analytically the entire singular value distribution of a deep network's input-output Jacobian, and reveals that controlling the entire distribution of Jacobian singular values is an important design consideration in deep learning.
On the Complexity of Neural Network Classifiers: A Comparison Between Shallow and Deep Architectures
- Computer ScienceIEEE Transactions on Neural Networks and Learning Systems
- 2014
A new measure based on topological concepts is introduced, aimed at evaluating the complexity of the function implemented by a neural network, used for classification purposes, and results seem to support the idea that deep networks actually implements functions of higher complexity, so that they are able, with the same number of resources, to address more difficult problems.
The Emergence of Spectral Universality in Deep Networks
- Computer ScienceAISTATS
- 2018
This work uses powerful tools from free probability theory to provide a detailed analytic understanding of how a deep network's Jacobian spectrum depends on various hyperparameters including the nonlinearity, the weight and bias distributions, and the depth.
Neural Persistence: A Complexity Measure for Deep Neural Networks Using Algebraic Topology
- Computer ScienceICLR
- 2019
This work proposes neural persistence, a complexity measure for neural network architectures based on topological data analysis on weighted stratified graphs and derives a neural persistence-based stopping criterion that shortens the training process while achieving comparable accuracies as early stopping based on validation loss.
Implicit Self-Regularization in Deep Neural Networks: Evidence from Random Matrix Theory and Implications for Learning
- Computer ScienceJ. Mach. Learn. Res.
- 2021
A theory to identify 5+1 Phases of Training, corresponding to increasing amounts of Implicit Self-Regularization, which demonstrates that DNN optimization with larger batch sizes leads to less-well implicitly-regularized models, and it provides an explanation for the generalization gap phenomena.
Traditional and Heavy-Tailed Self Regularization in Neural Network Models
- Computer ScienceICML
- 2019
A novel form of Heavy-Tailed Self-Regularization is identified, similar to the self-organization seen in the statistical physics of disordered systems, which can depend strongly on the many knobs of the training process.
Detecting Statistical Interactions from Neural Network Weights
- Computer ScienceICLR
- 2018
This paper develops a novel framework for detecting statistical interactions captured by a feedforward multilayer neural network by directly interpreting its learned weights by observing that interactions between input features are created by the non-additive effect of nonlinear activation functions, and that interacting paths are encoded in weight matrices.
Empirical Analysis of the Hessian of Over-Parametrized Neural Networks
- Computer ScienceICLR
- 2018
A case that links the two observations: small and large batch gradient descent appear to converge to different basins of attraction but are in fact connected through their flat region and so belong to the same basin.
Human-level control through deep reinforcement learning
- Computer ScienceNature
- 2015
This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.