Deep Networks with Stochastic Depth

@inproceedings{Huang2016DeepNW,
  title={Deep Networks with Stochastic Depth},
  author={Gao Huang and Yu Sun and Zhuang Liu and Daniel Sedra and Kilian Q. Weinberger},
  booktitle={ECCV},
  year={2016}
}
Very deep convolutional networks with hundreds of layers have led to significant reductions in error on competitive benchmarks. Although the unmatched expressiveness of the many layers can be highly desirable at test time, training very deep networks comes with its own set of challenges. The gradients can vanish, the forward flow often diminishes, and the training time can be painfully slow. To address these problems, we propose stochastic depth, a training procedure that enables the seemingly… Expand
Wide Residual Networks
TLDR
This paper conducts a detailed experimental study on the architecture of ResNet blocks and proposes a novel architecture where the depth and width of residual networks are decreased and the resulting network structures are called wide residual networks (WRNs), which are far superior over their commonly used thin and very deep counterparts. Expand
On the importance of network architecture in training very deep neural networks
TLDR
This paper investigates the residual module in great extent by analyzing the structure ordering of different blocks and modify them one by one to achieve lower test error on CIFAR-10 dataset, and proposes a random-jump scheme to skip some residual connections during training. Expand
Going Deeper With Neural Networks Without Skip Connections
TLDR
This work proposes the training of very deep PlainNets by leveraging Leaky Rectified Linear Units (LReLUs), parameter constraint and strategic parameter initialization, and reports the best results known on the ImageNet dataset using a PlainNet. Expand
Training Very Deep Networks via Residual Learning with Stochastic Input Shortcut Connections
TLDR
This work proposes a modification of residual learning for training very deep networks to realize improved generalization performance; for this, it allows stochastic shortcut connections of identity mappings from the input to hidden layers. Expand
Depth Dropout: Efficient Training of Residual Convolutional Neural Networks
  • J. Guo, Stephen Gould
  • Computer Science
  • 2016 International Conference on Digital Image Computing: Techniques and Applications (DICTA)
  • 2016
TLDR
This paper extends the well-known dropout technique by randomly removing entire network layers instead of individual neurons during training and hence reducing the number of expensive convolution operations needed per training iteration. Expand
Weighted residuals for very deep networks
  • Falong Shen, Gang Zeng
  • Computer Science
  • 2016 3rd International Conference on Systems and Informatics (ICSAI)
  • 2016
TLDR
A weighted residual network is introduced to address the incompatibility between ReLU and element-wise addition and the deep network initialization problem and is able to learn to combine residuals from different layers effectively and efficiently. Expand
LayerOut: Freezing Layers in Deep Neural Networks
TLDR
This research proposes a novel regularization technique LayerOut to train deep neural networks which stochastically freeze the trainable parameters of a layer during an epoch of training and reduces the computational burden significantly. Expand
Deep networks with stochastic depth for acoustic modelling
TLDR
By randomly dropping a subset of layers during training, the studied stochastic depth training method helps reduce the training time substantially, yet the networks trained are much deeper since all the layers are kept during testing. Expand
The Shallow End: Empowering Shallower Deep-Convolutional Networks through Auxiliary Outputs
TLDR
This paper investigates the supervision vanishing issue in existing backpropagation (BP) methods and proposes to address it via an effective method, called Multi-way BP (MW-BP), which relies on multiple auxiliary losses added to the intermediate layers of the network. Expand
Improving the Capacity of Very Deep Networks with Maxout Units
TLDR
This paper proposes very deep networks with maxout units and elastic net regularization and shows that the features learned are quite linearly separable and reach state-of-the-art results on the USPS and MNIST datasets. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 39 REFERENCES
Deep Residual Learning for Image Recognition
TLDR
This work presents a residual learning framework to ease the training of networks that are substantially deeper than those used previously, and provides comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. Expand
Deeply-Supervised Nets
TLDR
The proposed deeply-supervised nets (DSN) method simultaneously minimizes classification error while making the learning process of hidden layers direct and transparent, and extends techniques from stochastic gradient methods to analyze the algorithm. Expand
Gradual DropIn of Layers to Train Very Deep Neural Networks
TLDR
It is shown that deep networks, which are untrainable with conventional methods, will converge with DropIn layers interspersed in the architecture, and it is demonstrated that DropIn provides regularization during training in an analogous way as dropout. Expand
On the importance of initialization and momentum in deep learning
TLDR
It is shown that when stochastic gradient descent with momentum uses a well-designed random initialization and a particular type of slowly increasing schedule for the momentum parameter, it can train both DNNs and RNNs to levels of performance that were previously achievable only with Hessian-Free optimization. Expand
Very Deep Convolutional Networks for Large-Scale Image Recognition
TLDR
This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers. Expand
Striving for Simplicity: The All Convolutional Net
TLDR
It is found that max-pooling can simply be replaced by a convolutional layer with increased stride without loss in accuracy on several image recognition benchmarks. Expand
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
TLDR
Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin. Expand
Understanding the difficulty of training deep feedforward neural networks
TLDR
The objective here is to understand better why standard gradient descent from random initialization is doing so poorly with deep neural networks, to better understand these recent relative successes and help design better algorithms in the future. Expand
ImageNet classification with deep convolutional neural networks
TLDR
A large, deep convolutional neural network was trained to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes and employed a recently developed regularization method called "dropout" that proved to be very effective. Expand
Dropout: a simple way to prevent neural networks from overfitting
TLDR
It is shown that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets. Expand
...
1
2
3
4
...