• Corpus ID: 203641913

Regularizing Neural Networks via Stochastic Branch Layers

@inproceedings{Park2019RegularizingNN,
  title={Regularizing Neural Networks via Stochastic Branch Layers},
  author={Wonpyo Park and Paul Hongsuck Seo and Bohyung Han and Minsu Cho},
  booktitle={Asian Conference on Machine Learning},
  year={2019}
}
We introduce a novel stochastic regularization technique for deep neural networks, which decomposes a layer into multiple branches with different parameters and merges stochastically sampled combinations of the outputs from the branches during training. Since the factorized branches can collapse into a single branch through a linear operation, inference requires no additional complexity compared to the ordinary layers. The proposed regularization method, referred to as StochasticBranch, is… 

Figures and Tables from this paper

References

SHOWING 1-10 OF 34 REFERENCES

Regularizing Deep Neural Networks by Noise: Its Interpretation and Optimization

This paper interprets that the conventional training methods with regularization by noise injection optimize the lower bound of the true objective and proposes a technique to achieve a tighter lower bound using multiple noise samples per training example in a stochastic gradient descent iteration.

Stochastic Pooling for Regularization of Deep Convolutional Neural Networks

We introduce a simple and effective method for regularizing large convolutional neural networks. We replace the conventional deterministic pooling operations with a stochastic procedure, randomly

Regularization of Neural Networks using DropConnect

This work introduces DropConnect, a generalization of Dropout, for regularizing large fully-connected layers within neural networks, and derives a bound on the generalization performance of both Dropout and DropConnect.

Dropout distillation

This work introduces a novel approach, coined "dropout distillation", that allows to train a predictor in a way to better approximate the intractable, but preferable, averaging process, while keeping under control its computational efficiency.

Gradual DropIn of Layers to Train Very Deep Neural Networks

It is shown that deep networks, which are untrainable with conventional methods, will converge with DropIn layers interspersed in the architecture, and it is demonstrated that DropIn provides regularization during training in an analogous way as dropout.

Shakeout: A New Regularized Deep Neural Network Training Scheme

This paper presents a new training scheme: Shakeout, which leads to a combination of L1 regularization and L2 regularization imposed on the weights, which has been proved effective by the Elastic Net models in practice.

Adaptive Dropout with Rademacher Complexity Regularization

This work shows the network Rademacher complexity is bounded by a function related to the dropout rate vectors and the weight coefficient matrices and imposes this bound as a regularizer and provides a theoretical justified way to trade-off between model complexity and representation power.

Dropout with Expectation-linear Regularization

This work first formulate dropout as a tractable approximation of some latent variable model, leading to a clean view of parameter sharing and enabling further theoretical analysis, and introduces (approximate) expectation-linear dropout neural networks, whose inference gap the authors are able to formally characterize.

Deep Networks with Stochastic Depth

Stochastic depth is proposed, a training procedure that enables the seemingly contradictory setup to train short networks and use deep networks at test time and reduces training time substantially and improves the test error significantly on almost all data sets that were used for evaluation.

Adaptive dropout for training deep neural networks

A method is described called 'standout' in which a binary belief network is overlaid on a neural network and is used to regularize of its hidden units by selectively setting activities to zero, which achieves lower classification error rates than other feature learning methods, including standard dropout, denoising auto-encoders, and restricted Boltzmann machines.