Corpus ID: 166228413

All Neural Networks are Created Equal

  title={All Neural Networks are Created Equal},
  author={Guy Hacohen and Leshem Choshen and D. Weinshall},
One of the unresolved questions in deep learning is the nature of the solutions that are being discovered. We investigate the collection of solutions reached by the same network architecture, with different random initialization of weights and random mini-batches. These solutions are shown to be rather similar - more often than not, each train and test example is either classified correctly by all the networks, or by none at all. Surprisingly, all the network instances seem to share the same… Expand
Neural Network Memorization Dissection
The analysis shows that DNNs have \textit{One way to Learn} and \text it{N ways to Memorize} and the analysis uses gradient information to gain an understanding of the analysis results. Expand


Convergent Learning: Do different neural networks learn the same representations?
This paper investigates the extent to which neural networks exhibit convergent learning, which is when the representations learned by multiple nets converge to a set of features which are either individually similar between networks or where subsets of features span similar low-dimensional spaces. Expand
Understanding the difficulty of training deep feedforward neural networks
The objective here is to understand better why standard gradient descent from random initialization is doing so poorly with deep neural networks, to better understand these recent relative successes and help design better algorithms in the future. Expand
On The Power of Curriculum Learning in Training Deep Networks
This work analyzes the effect of curriculum learning, which involves the non-uniform sampling of mini-batches, on the training of deep networks, and specifically CNNs trained for image recognition, and defines the concept of an ideal curriculum. Expand
Towards Understanding Learning Representations: To What Extent Do Different Neural Networks Learn the Same Representation
The theory gives a complete characterization of the structure of neuron activation subspace matches, where the core concepts are maximum match and simple match which describe the overall and the finest similarity between sets of neurons in two networks respectively. Expand
Curriculum learning
It is hypothesized that curriculum learning has both an effect on the speed of convergence of the training process to a minimum and on the quality of the local minima obtained: curriculum learning can be seen as a particular form of continuation method (a general strategy for global optimization of non-convex functions). Expand
Why Does Unsupervised Pre-training Help Deep Learning?
The results suggest that unsupervised pre-training guides the learning towards basins of attraction of minima that support better generalization from the training data set; the evidence from these results supports a regularization explanation for the effect of pre- training. Expand
Learning Multiple Layers of Features from Tiny Images
It is shown how to train a multi-layer generative model that learns to extract meaningful features which resemble those found in the human visual cortex, using a novel parallelization algorithm to distribute the work among multiple machines connected on a network. Expand
MentorNet: Learning Data-Driven Curriculum for Very Deep Neural Networks on Corrupted Labels
Experimental results demonstrate that the proposed novel technique of learning another neural network, called MentorNet, to supervise the training of the base deep networks, namely, StudentNet, can significantly improve the generalization performance of deep networks trained on corrupted training data. Expand
Understanding deep learning requires rethinking generalization
These experiments establish that state-of-the-art convolutional networks for image classification trained with stochastic gradient methods easily fit a random labeling of the training data, and confirm that simple depth two neural networks already have perfect finite sample expressivity. Expand
Deep Nets Don't Learn via Memorization
It is established that there are qualitative differences when learning noise vs. natural datasets, and that for appropriately tuned explicit regularization, e.g. dropout, DNN training performance can be degraded on noise datasets without compromising generalization on real data. Expand