Corpus ID: 237572196

AutoInit: Analytic Signal-Preserving Weight Initialization for Neural Networks

  title={AutoInit: Analytic Signal-Preserving Weight Initialization for Neural Networks},
  author={Garrett Bingham and Risto Miikkulainen},
Neural networks require careful weight initialization to prevent signals from exploding or vanishing. Existing initialization schemes solve this problem in specific cases by assuming that the network has a certain activation function or topology. It is difficult to derive such weight initialization strategies, and modern architectures therefore often use these same initialization schemes even though their assumptions do not hold. This paper introduces AutoInit, a weight initialization algorithm… Expand

Figures and Tables from this paper


How to Initialize your Network? Robust Initialization for WeightNorm & ResNets
This work proposes a novel parameter initialization strategy that avoids explosion/vanishment of information across layers for weight normalized networks with and without residual connections and shows that the proposed initialization outperforms existing initialization methods in terms of generalization performance, robustness to hyper-parameter values and variance between seeds. Expand
Data-dependent Initializations of Convolutional Neural Networks
This work presents a fast and simple data-dependent initialization procedure, that sets the weights of a network such that all units in the network train at roughly the same rate, avoiding vanishing or exploding gradients. Expand
Self-Normalizing Neural Networks
Self-normalizing neural networks (SNNs) are introduced to enable high-level abstract representations and it is proved that activations close to zero mean and unit variance that are propagated through many network layers will converge towards zero meanand unit variance -- even under the presence of noise and perturbations. Expand
Centered Weight Normalization in Accelerating Training of Deep Neural Networks
This paper proposes to reparameterize the input weight of each neuron in deep neural networks by normalizing it with zero-mean and unit-norm, followed by a learnable scalar parameter to adjust the norm of the weight. Expand
Resurrecting the sigmoid in deep learning through dynamical isometry: theory and practice
This work uses powerful tools from free probability theory to compute analytically the entire singular value distribution of a deep network's input-output Jacobian, and reveals that controlling the entire distribution of Jacobian singular values is an important design consideration in deep learning. Expand
Fixup Initialization: Residual Learning Without Normalization
This work proposes fixed-update initialization (Fixup), an initialization motivated by solving the exploding and vanishing gradient problem at the beginning of training via properly rescaling a standard initialization that enables residual networks without normalization to achieve state-of-the-art performance in image classification and machine translation. Expand
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin. Expand
Provable Benefit of Orthogonal Initialization in Optimizing Deep Linear Networks
The results demonstrate how the benefits of a good initialization can persist throughout learning, suggesting an explanation for the recent empirical successes found by initializing very deep non-linear networks according to the principle of dynamical isometry. Expand
Dynamical Isometry and a Mean Field Theory of CNNs: How to Train 10, 000-Layer Vanilla Convolutional Neural Networks
This work demonstrates that it is possible to train vanilla CNNs with ten thousand layers or more simply by using an appropriate initialization scheme, and presents an algorithm for generating such random initial orthogonal convolution kernels. Expand
Understanding the difficulty of training deep feedforward neural networks
The objective here is to understand better why standard gradient descent from random initialization is doing so poorly with deep neural networks, to better understand these recent relative successes and help design better algorithms in the future. Expand