• Corpus ID: 236772380

The Separation Capacity of Random Neural Networks

@article{Dirksen2021TheSC,
title={The Separation Capacity of Random Neural Networks},
author={Sjoerd Dirksen and Martin Genzel and Laurent Jacques and Alexander Stollenwerk},
journal={ArXiv},
year={2021},
volume={abs/2108.00207}
}
• Published 31 July 2021
• Computer Science
• ArXiv
Neural networks with random weights appear in a variety of machine learning applications, most prominently as the initialization of many deep learning algorithms and as a computationally cheap alternative to fully learned neural networks. In the present article, we enhance the theoretical understanding of random neural networks by addressing the following data separation problem: under what conditions can a random neural network make two classes X − , X + ⊂ R d (with positive distance) linearly…

References

SHOWING 1-10 OF 63 REFERENCES

• Computer Science
IEEE Transactions on Signal Processing
• 2016
It is formally proved that these networks with random Gaussian weights perform a distance-preserving embedding of the data, with a special treatment for in-class and out-of-class data.
• Computer Science, Mathematics
ArXiv
• 2020
A (corrected) rigorous proof that the Igelnik and Pao construction is a universal approximator for continuous functions on compact domains, with approximation error decaying asymptotically like $O(1/\sqrt{n})$ for the number of network nodes, is provided.
• Computer Science
NeurIPS
• 2019
The expected $0$-$1$ loss of a wide enough ReLU network trained with stochastic gradient descent and random initialization can be bounded by the training Loss of a random feature model induced by the network gradient at initialization, which is called a neural tangent random feature (NTRF) model.
• Computer Science, Mathematics
COLT
• 2019
This paper shows that the number of hidden units only needs to be larger than a quantity dependent on the regularity properties of the data, and independent of the dimensions, and generalizes this analysis to the case of unbounded activation functions.
• Computer Science
NeurIPS
• 2019
This paper rigorously show that random features cannot be used to learn even a single ReLU neuron with standard Gaussian inputs, unless the network size is exponentially large, and concludes that a single neuron is learnable with gradient-based methods.
• Computer Science
ICML
• 2019
This work proves why stochastic gradient descent can find global minima on the training objective of DNNs in $\textit{polynomial time}$ and implies an equivalence between over-parameterized neural networks and neural tangent kernel (NTK) in the finite (and polynomial) width setting.
• Computer Science
NeurIPS
• 2019
An improved analysis of the global convergence of (stochastic) gradient descent for training deep neural networks, which only requires a milder over-parameterization condition than previous work in terms of the training sample size and other problem-dependent parameters is provided.
• Computer Science
Proceedings of the National Academy of Sciences
• 2018
A compact description of the SGD dynamics is derived in terms of a limiting partial differential equation that allows for “averaging out” some of the complexities of the landscape of neural networks and can be used to prove a general convergence result for noisy SGD.
• Computer Science
NeurIPS
• 2018
It is proved that SGD learns a network with a small generalization error, albeit the network has enough capacity to fit arbitrary labels, when the data comes from mixtures of well-separated distributions.
• Computer Science
NeurIPS
• 2019
The current paper gives the first efficient exact algorithm for computing the extension of NTK to convolutional neural nets, which it is called Convolutional NTK (CNTK), as well as an efficient GPU implementation of this algorithm.