• Corpus ID: 236772380

The Separation Capacity of Random Neural Networks

@article{Dirksen2021TheSC,
  title={The Separation Capacity of Random Neural Networks},
  author={Sjoerd Dirksen and Martin Genzel and Laurent Jacques and Alexander Stollenwerk},
  journal={ArXiv},
  year={2021},
  volume={abs/2108.00207}
}
Neural networks with random weights appear in a variety of machine learning applications, most prominently as the initialization of many deep learning algorithms and as a computationally cheap alternative to fully learned neural networks. In the present article, we enhance the theoretical understanding of random neural networks by addressing the following data separation problem: under what conditions can a random neural network make two classes X − , X + ⊂ R d (with positive distance) linearly… 

Figures from this paper

References

SHOWING 1-10 OF 63 REFERENCES

Deep Neural Networks with Random Gaussian Weights: A Universal Classification Strategy?

It is formally proved that these networks with random Gaussian weights perform a distance-preserving embedding of the data, with a special treatment for in-class and out-of-class data.

Random Vector Functional Link Networks for Function Approximation on Manifolds

A (corrected) rigorous proof that the Igelnik and Pao construction is a universal approximator for continuous functions on compact domains, with approximation error decaying asymptotically like $O(1/\sqrt{n})$ for the number of network nodes, is provided.

Generalization Bounds of Stochastic Gradient Descent for Wide and Deep Neural Networks

The expected $0$-$1$ loss of a wide enough ReLU network trained with stochastic gradient descent and random initialization can be bounded by the training Loss of a random feature model induced by the network gradient at initialization, which is called a neural tangent random feature (NTRF) model.

Mean-field theory of two-layers neural networks: dimension-free bounds and kernel limit

This paper shows that the number of hidden units only needs to be larger than a quantity dependent on the regularity properties of the data, and independent of the dimensions, and generalizes this analysis to the case of unbounded activation functions.

On the Power and Limitations of Random Features for Understanding Neural Networks

This paper rigorously show that random features cannot be used to learn even a single ReLU neuron with standard Gaussian inputs, unless the network size is exponentially large, and concludes that a single neuron is learnable with gradient-based methods.

A Convergence Theory for Deep Learning via Over-Parameterization

This work proves why stochastic gradient descent can find global minima on the training objective of DNNs in $\textit{polynomial time}$ and implies an equivalence between over-parameterized neural networks and neural tangent kernel (NTK) in the finite (and polynomial) width setting.

An Improved Analysis of Training Over-parameterized Deep Neural Networks

An improved analysis of the global convergence of (stochastic) gradient descent for training deep neural networks, which only requires a milder over-parameterization condition than previous work in terms of the training sample size and other problem-dependent parameters is provided.

A mean field view of the landscape of two-layer neural networks

A compact description of the SGD dynamics is derived in terms of a limiting partial differential equation that allows for “averaging out” some of the complexities of the landscape of neural networks and can be used to prove a general convergence result for noisy SGD.

Learning Overparameterized Neural Networks via Stochastic Gradient Descent on Structured Data

It is proved that SGD learns a network with a small generalization error, albeit the network has enough capacity to fit arbitrary labels, when the data comes from mixtures of well-separated distributions.

On Exact Computation with an Infinitely Wide Neural Net

The current paper gives the first efficient exact algorithm for computing the extension of NTK to convolutional neural nets, which it is called Convolutional NTK (CNTK), as well as an efficient GPU implementation of this algorithm.
...