Randomly Initialized One-Layer Neural Networks Make Data Linearly Separable

  title={Randomly Initialized One-Layer Neural Networks Make Data Linearly Separable},
  author={Promit Ghosal and Srinath Mahankali and Yihang Sun},
Recently, neural networks have been shown to perform exceptionally well in trans-forming two arbitrary sets into two linearly separable sets. Doing this with a randomly initialized neural network is of immense interest because the associated computation is cheaper than using fully trained networks. In this paper, we show that, with sufficient width, a randomly initialized one-layer neural network transforms two sets into two linearly separable sets with high probability. Furthermore, we provide… 

Figures from this paper


The Separation Capacity of Random Neural Networks
It is shown that a sufficiently large two-layer ReLU-network with standard Gaussian weights and uniformly distributed biases can solve the data separation problem under what conditions can a random neural network make two classes X−,X+ linearly separable.
On the Power and Limitations of Random Features for Understanding Neural Networks
This paper rigorously show that random features cannot be used to learn even a single ReLU neuron with standard Gaussian inputs, unless the network size is exponentially large, and concludes that a single neuron is learnable with gradient-based methods.
Random Vector Functional Link Networks for Function Approximation on Manifolds
A (corrected) rigorous proof that the Igelnik and Pao construction is a universal approximator for continuous functions on compact domains, with approximation error decaying asymptotically like $O(1/\sqrt{n})$ for the number of network nodes, is provided.
The Benefits of Over-parameterization at Initialization in Deep ReLU Networks
This paper proves some desirable theoretical properties at initialization of over-parameterized ReLU networks and shows novel properties that hold under He initialization, including the aforementioned hidden activation norm property, and shows that this property holds for a finite width network even when the number of data samples is infinite.
Identity Matters in Deep Learning
This work gives a strikingly simple proof that arbitrarily deep linear residual networks have no spurious local optima and shows that residual networks with ReLu activations have universal finite-sample expressivity in the sense that the network can represent any function of its sample provided that the model has more parameters than the sample size.
ImageNet classification with deep convolutional neural networks
A large, deep convolutional neural network was trained to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes and employed a recently developed regularization method called "dropout" that proved to be very effective.
Understanding deep learning requires rethinking generalization
These experiments establish that state-of-the-art convolutional networks for image classification trained with stochastic gradient methods easily fit a random labeling of the training data, and confirm that simple depth two neural networks already have perfect finite sample expressivity.
Upper bounds on the number of hidden neurons in feedforward networks with arbitrary bounded nonlinear activation functions
This paper rigorously proves that standard single-hidden layer feedforward networks with at most N hidden neurons and with any bounded nonlinear activation function which has a limit at one infinity can learn N distinct samples with zero error.
Learning capability and storage capacity of two-hidden-layer feedforward networks
  • G. Huang
  • Computer Science
    IEEE Trans. Neural Networks
  • 2003
This paper rigorously proves in a constructive method that two-hidden-layer feedforward networks (TLFNs) with 2/spl radic/(m+2)N (/spl Lt/N) hidden neurons can learn any N distinct samples with any arbitrarily small error, where m is the required number of output neurons.
Extreme learning machine: Theory and applications