Regularizing Neural Networks via Stochastic Branch Layers
@inproceedings{Park2019RegularizingNN, title={Regularizing Neural Networks via Stochastic Branch Layers}, author={Wonpyo Park and Paul Hongsuck Seo and Bohyung Han and Minsu Cho}, booktitle={Asian Conference on Machine Learning}, year={2019} }
We introduce a novel stochastic regularization technique for deep neural networks, which decomposes a layer into multiple branches with different parameters and merges stochastically sampled combinations of the outputs from the branches during training. Since the factorized branches can collapse into a single branch through a linear operation, inference requires no additional complexity compared to the ordinary layers. The proposed regularization method, referred to as StochasticBranch, is…
Figures and Tables from this paper
References
SHOWING 1-10 OF 34 REFERENCES
Regularizing Deep Neural Networks by Noise: Its Interpretation and Optimization
- Computer ScienceNIPS
- 2017
This paper interprets that the conventional training methods with regularization by noise injection optimize the lower bound of the true objective and proposes a technique to achieve a tighter lower bound using multiple noise samples per training example in a stochastic gradient descent iteration.
Stochastic Pooling for Regularization of Deep Convolutional Neural Networks
- Computer ScienceICLR
- 2013
We introduce a simple and effective method for regularizing large convolutional neural networks. We replace the conventional deterministic pooling operations with a stochastic procedure, randomly…
Regularization of Neural Networks using DropConnect
- Computer ScienceICML
- 2013
This work introduces DropConnect, a generalization of Dropout, for regularizing large fully-connected layers within neural networks, and derives a bound on the generalization performance of both Dropout and DropConnect.
Dropout distillation
- Computer ScienceICML
- 2016
This work introduces a novel approach, coined "dropout distillation", that allows to train a predictor in a way to better approximate the intractable, but preferable, averaging process, while keeping under control its computational efficiency.
Gradual DropIn of Layers to Train Very Deep Neural Networks
- Computer Science2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- 2016
It is shown that deep networks, which are untrainable with conventional methods, will converge with DropIn layers interspersed in the architecture, and it is demonstrated that DropIn provides regularization during training in an analogous way as dropout.
Shakeout: A New Regularized Deep Neural Network Training Scheme
- Computer ScienceAAAI
- 2016
This paper presents a new training scheme: Shakeout, which leads to a combination of L1 regularization and L2 regularization imposed on the weights, which has been proved effective by the Elastic Net models in practice.
Adaptive Dropout with Rademacher Complexity Regularization
- Computer ScienceICLR
- 2018
This work shows the network Rademacher complexity is bounded by a function related to the dropout rate vectors and the weight coefficient matrices and imposes this bound as a regularizer and provides a theoretical justified way to trade-off between model complexity and representation power.
Dropout with Expectation-linear Regularization
- Computer ScienceICLR
- 2017
This work first formulate dropout as a tractable approximation of some latent variable model, leading to a clean view of parameter sharing and enabling further theoretical analysis, and introduces (approximate) expectation-linear dropout neural networks, whose inference gap the authors are able to formally characterize.
Deep Networks with Stochastic Depth
- Computer ScienceECCV
- 2016
Stochastic depth is proposed, a training procedure that enables the seemingly contradictory setup to train short networks and use deep networks at test time and reduces training time substantially and improves the test error significantly on almost all data sets that were used for evaluation.
Adaptive dropout for training deep neural networks
- Computer ScienceNIPS
- 2013
A method is described called 'standout' in which a binary belief network is overlaid on a neural network and is used to regularize of its hidden units by selectively setting activities to zero, which achieves lower classification error rates than other feature learning methods, including standard dropout, denoising auto-encoders, and restricted Boltzmann machines.