• Corpus ID: 18359848

Residual Networks are Exponential Ensembles of Relatively Shallow Networks

@article{Veit2016ResidualNA,
  title={Residual Networks are Exponential Ensembles of Relatively Shallow Networks},
  author={Andreas Veit and Michael J. Wilber and Serge J. Belongie},
  journal={ArXiv},
  year={2016},
  volume={abs/1605.06431}
}
In this work, we introduce a novel interpretation of residual networks showing they are exponential ensembles. This observation is supported by a large-scale lesion study that demonstrates they behave just like ensembles at test time. Subsequently, we perform an analysis showing these ensembles mostly consist of networks that are each relatively shallow. For example, contrary to our expectations, most of the gradient in a residual network with 110 layers comes from an ensemble of very short… 

Figures from this paper

Multi-Residual Networks: Improving the Speed and Accuracy of Residual Networks
TLDR
A new convolutional neural network architecture is proposed which builds upon the success of residual networks by explicitly exploiting the interpretation of very deep networks as an ensemble, and generates models that are wider, rather than deeper, which significantly improves accuracy.
Multi-Residual Networks
TLDR
The effective range of ensembles is examined by introducing multi-residual networks that significantly improve classification accuracy of residual networks and obtains a test error rate of 3.92 on CIFAR-10 that outperforms all existing models.
New architectures for very deep learning
TLDR
This thesis develops new architectures that, for the first time, allow very deep networks to be optimized efficiently and reliably and addresses two key issues that hamper credit assignment in neural networks: cross-pattern interference and vanishing gradients.
Residual Connections Encourage Iterative Inference
TLDR
It is shown that residual connections naturally encourage features of residual blocks to move along the negative gradient of loss as the authors go from one block to the next, and empirical analysis suggests that Resnets are able to perform both representation learning and iterative refinement.
On the Connection of Deep Fusion to Ensembling
TLDR
A new deeply-fused network is developed that combines two networks in a merge-and-run fusion manner that is less deep than a ResNet but yields an ensemble of the same number of more-capable component networks, thus improving the classification accuracy.
Highway and Residual Networks learn Unrolled Iterative Estimation
TLDR
It is demonstrated that an alternative viewpoint based on unrolled iterative estimation -- a group of successive layers iteratively refine their estimates of the same features instead of computing an entirely new representation leads to the construction of Highway and Residual networks.
UNROLLED ITERATIVE ESTIMATION
TLDR
It is demonstrated that an alternative viewpoint based on unrolled iterative estimation—a group of successive layers iteratively refine their estimates of the same features instead of computing an entirely new representation leads to the construction of Highway and Residual networks.
DiracNets: Training Very Deep Neural Networks Without Skip-Connections
TLDR
A simple Dirac weight parameterization is proposed, which allows us to train very deep plain networks without explicit skip-connections, and achieve nearly the same performance.
Robust Large Margin Deep Neural Networks
TLDR
The analysis leads to the conclusion that a bounded spectral norm of the network's Jacobian matrix in the neighbourhood of the training samples is crucial for a deep neural network of arbitrary depth and width to generalize well.
PolyNet: A Pursuit of Structural Diversity in Very Deep Networks
TLDR
This work presents a new family of modules, namely the PolyInception, which can be flexibly inserted in isolation or in a composition as replacements of different parts of a network, and demonstrates substantial improvements over the state-of-the-art on the ILSVRC 2012 benchmark.
...
...

References

SHOWING 1-10 OF 24 REFERENCES
Deep Networks with Stochastic Depth
TLDR
Stochastic depth is proposed, a training procedure that enables the seemingly contradictory setup to train short networks and use deep networks at test time and reduces training time substantially and improves the test error significantly on almost all data sets that were used for evaluation.
Intriguing properties of neural networks
TLDR
It is found that there is no distinction between individual highlevel units and random linear combinations of high level units, according to various methods of unit analysis, and it is suggested that it is the space, rather than the individual units, that contains of the semantic information in the high layers of neural networks.
Identity Mappings in Deep Residual Networks
TLDR
The propagation formulations behind the residual building blocks suggest that the forward and backward signals can be directly propagated from one block to any other block, when using identity mappings as the skip connections and after-addition activation.
How transferable are features in deep neural networks?
TLDR
This paper quantifies the generality versus specificity of neurons in each layer of a deep convolutional neural network and reports a few surprising results, including that initializing a network with transferred features from almost any number of layers can produce a boost to generalization that lingers even after fine-tuning to the target dataset.
Deep Residual Learning for Image Recognition
TLDR
This work presents a residual learning framework to ease the training of networks that are substantially deeper than those used previously, and provides comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.
Learning Multiple Layers of Features from Tiny Images
TLDR
It is shown how to train a multi-layer generative model that learns to extract meaningful features which resemble those found in the human visual cortex, using a novel parallelization algorithm to distribute the work among multiple machines connected on a network.
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
TLDR
Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin.
Visualizing and Understanding Convolutional Networks
TLDR
A novel visualization technique is introduced that gives insight into the function of intermediate feature layers and the operation of the classifier in large Convolutional Network models, used in a diagnostic role to find model architectures that outperform Krizhevsky et al on the ImageNet classification benchmark.
Very Deep Convolutional Networks for Large-Scale Image Recognition
TLDR
This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.
Improving neural networks by preventing co-adaptation of feature detectors
When a large feedforward neural network is trained on a small training set, it typically performs poorly on held-out test data. This "overfitting" is greatly reduced by randomly omitting half of the
...
...