Corpus ID: 236772380

The Separation Capacity of Random Neural Networks

@article{Dirksen2021TheSC,
  title={The Separation Capacity of Random Neural Networks},
  author={Sjoerd Dirksen and Martin Genzel and Laurent Jacques and Alexander Stollenwerk},
  journal={ArXiv},
  year={2021},
  volume={abs/2108.00207}
}
Neural networks with random weights appear in a variety of machine learning applications, most prominently as the initialization of many deep learning algorithms and as a computationally cheap alternative to fully learned neural networks. In the present article we enhance the theoretical understanding of random neural nets by addressing the following data separation problem: under what conditions can a random neural network make two classes X−,X+ ⊂ Rd (with positive distance) linearly separable… Expand

Figures from this paper

References

SHOWING 1-10 OF 47 REFERENCES
Deep Neural Networks with Random Gaussian Weights: A Universal Classification Strategy?
TLDR
It is formally proved that these networks with random Gaussian weights perform a distance-preserving embedding of the data, with a special treatment for in-class and out-of-class data. Expand
Random Vector Functional Link Networks for Function Approximation on Manifolds
TLDR
A (corrected) rigorous proof that the Igelnik and Pao construction is a universal approximator for continuous functions on compact domains, with approximation error decaying asymptotically like $O(1/\sqrt{n})$ for the number of network nodes, is provided. Expand
On the Power and Limitations of Random Features for Understanding Neural Networks
TLDR
This paper rigorously show that random features cannot be used to learn even a single ReLU neuron with standard Gaussian inputs, unless the network size is exponentially large, and concludes that a single neuron is learnable with gradient-based methods. Expand
Learning Polynomials with Neural Networks
TLDR
This paper shows that for a randomly initialized neural network with sufficiently many hidden units, the generic gradient descent algorithm learns any low degree polynomial, assuming the authors initialize the weights randomly, and shows that if they use complex-valued weights, there are no "robust local minima". Expand
Understanding deep learning requires rethinking generalization
TLDR
These experiments establish that state-of-the-art convolutional networks for image classification trained with stochastic gradient methods easily fit a random labeling of the training data, and confirm that simple depth two neural networks already have perfect finite sample expressivity. Expand
Understanding deep learning (still) requires rethinking generalization
TLDR
These experiments establish that state-of-the-art convolutional networks for image classification trained with stochastic gradient methods easily fit a random labeling of the training data, and corroborate these experimental findings with a theoretical construction showing that simple depth two neural networks already have perfect finite sample expressivity. Expand
Extreme learning machine: Theory and applications
TLDR
A new learning algorithm called ELM is proposed for feedforward neural networks (SLFNs) which randomly chooses hidden nodes and analytically determines the output weights of SLFNs which tends to provide good generalization performance at extremely fast learning speed. Expand
Towards a Unified Analysis of Random Fourier Features
TLDR
This work provides the first unified risk analysis of learning with random Fourier features using the squared error and Lipschitz continuous loss functions and devise a simple approximation scheme which provably reduces the computational cost without loss of statistical efficiency. Expand
The Benefits of Over-parameterization at Initialization in Deep ReLU Networks
TLDR
This paper proves some desirable theoretical properties at initialization of over-parameterized ReLU networks and shows novel properties that hold under He initialization, including the aforementioned hidden activation norm property, and shows that this property holds for a finite width network even when the number of data samples is infinite. Expand
On the Approximation Properties of Random ReLU Features
We study the approximation properties of random ReLU features through their reproducing kernel Hilbert space (RKHS). We first prove a universality theorem for the RKHS induced by random featuresExpand
...
1
2
3
4
5
...