Randomly Initialized One-Layer Neural Networks Make Data Linearly Separable

@article{Ghosal2022RandomlyIO,
  title={Randomly Initialized One-Layer Neural Networks Make Data Linearly Separable},
  author={Promit Ghosal and Srinath Mahankali and Yihang Sun},
  journal={ArXiv},
  year={2022},
  volume={abs/2205.11716}
}
Recently, neural networks have been shown to perform exceptionally well in trans-forming two arbitrary sets into two linearly separable sets. Doing this with a randomly initialized neural network is of immense interest because the associated computation is cheaper than using fully trained networks. In this paper, we show that, with sufficient width, a randomly initialized one-layer neural network transforms two sets into two linearly separable sets with high probability. Furthermore, we provide… 

Figures from this paper

References

SHOWING 1-10 OF 48 REFERENCES

The Separation Capacity of Random Neural Networks

TLDR
It is shown that a sufficiently large two-layer ReLU-network with standard Gaussian weights and uniformly distributed biases can solve the data separation problem under what conditions can a random neural network make two classes X−,X+ linearly separable.

On the Power and Limitations of Random Features for Understanding Neural Networks

TLDR
This paper rigorously show that random features cannot be used to learn even a single ReLU neuron with standard Gaussian inputs, unless the network size is exponentially large, and concludes that a single neuron is learnable with gradient-based methods.

Random Vector Functional Link Networks for Function Approximation on Manifolds

TLDR
A (corrected) rigorous proof that the Igelnik and Pao construction is a universal approximator for continuous functions on compact domains, with approximation error decaying asymptotically like $O(1/\sqrt{n})$ for the number of network nodes, is provided.

Identity Matters in Deep Learning

TLDR
This work gives a strikingly simple proof that arbitrarily deep linear residual networks have no spurious local optima and shows that residual networks with ReLu activations have universal finite-sample expressivity in the sense that the network can represent any function of its sample provided that the model has more parameters than the sample size.

ImageNet classification with deep convolutional neural networks

TLDR
A large, deep convolutional neural network was trained to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes and employed a recently developed regularization method called "dropout" that proved to be very effective.

Understanding deep learning requires rethinking generalization

TLDR
These experiments establish that state-of-the-art convolutional networks for image classification trained with stochastic gradient methods easily fit a random labeling of the training data, and confirm that simple depth two neural networks already have perfect finite sample expressivity.

Upper bounds on the number of hidden neurons in feedforward networks with arbitrary bounded nonlinear activation functions

TLDR
This paper rigorously proves that standard single-hidden layer feedforward networks with at most N hidden neurons and with any bounded nonlinear activation function which has a limit at one infinity can learn N distinct samples with zero error.

Learning capability and storage capacity of two-hidden-layer feedforward networks

  • G. Huang
  • Computer Science
    IEEE Trans. Neural Networks
  • 2003
TLDR
This paper rigorously proves in a constructive method that two-hidden-layer feedforward networks (TLFNs) with 2/spl radic/(m+2)N (/spl Lt/N) hidden neurons can learn any N distinct samples with any arbitrarily small error, where m is the required number of output neurons.

Extreme learning machine: Theory and applications

On the capabilities of multilayer perceptrons

  • E. Baum
  • Computer Science
    J. Complex.
  • 1988