• Corpus ID: 11245315

Dropout with Expectation-linear Regularization

@article{Ma2017DropoutWE,
  title={Dropout with Expectation-linear Regularization},
  author={Xuezhe Ma and Yingkai Gao and Zhiting Hu and Yaoliang Yu and Yuntian Deng and Eduard H. Hovy},
  journal={ArXiv},
  year={2017},
  volume={abs/1609.08017}
}
Dropout, a simple and effective way to train deep neural networks, has led to a number of impressive empirical successes and spawned many recent theoretical investigations. However, the gap between dropout's training and inference phases, introduced due to tractability considerations, has largely remained under-appreciated. In this work, we first formulate dropout as a tractable approximation of some latent variable model, leading to a clean view of parameter sharing and enabling further… 

Tables from this paper

R-Drop: Regularized Dropout for Neural Networks
TLDR
This paper introduces a simple consistency training strategy to regularize dropout, namely R-Drop, which forces the output distributions of different sub models generated by dropout to be consistent with each other.
Regularizing Deep Neural Networks by Noise: Its Interpretation and Optimization
TLDR
This paper interprets that the conventional training methods with regularization by noise injection optimize the lower bound of the true objective and proposes a technique to achieve a tighter lower bound using multiple noise samples per training example in a stochastic gradient descent iteration.
Pushing the bounds of dropout
We show that dropout training is best understood as performing MAP estimation concurrently for a family of conditional models whose objectives are themselves lower bounded by the original dropout
Advanced Dropout: A Model-free Methodology for Bayesian Dropout Optimization
TLDR
The proposed advanced dropout technique applies a model-free and easily implemented distribution with parametric prior, and adaptively adjusts dropout rate to carry out an end-to-end training of DNNs.
Dropout Training, Data-dependent Regularization, and Generalization Bounds
TLDR
By applying this framework to ReLU networks with one hidden layer, a generalization upper bound is derived with no assumptions on the parameter norms or data distribution, with O(1/n) fast rate and adaptivity to geometry of data points being achieved at the same time.
Fraternal Dropout
TLDR
A simple technique called fraternal dropout is proposed that takes advantage of dropout to train two identical copies of an RNN (that share parameters) with different dropout masks while minimizing the difference between their (pre-softmax) predictions.
Latent Alignment and Variational Attention
TLDR
Variational attention networks are considered, alternatives to soft and hard attention for learning latent variable alignment models, with tighter approximation bounds based on amortized variational inference, and methods for reducing the variance of gradients are proposed to make these approaches computationally feasible.
2 Background : Latent Alignment and Neural Attention
TLDR
Variational attention networks are considered, alternatives to soft and hard attention for learning latent variable alignment models, with tighter approximation bounds based on amortized variational inference, and methods for reducing the variance of gradients are proposed to make these approaches computationally feasible.
Adversarial Dropout for Recurrent Neural Networks
TLDR
It is demonstrated that minimizing the regularizer improved the effectiveness of the dropout for RNNs on sequential MNIST tasks, semi-supervised text classification tasks, and language modeling tasks.
Demystifying Dropout
TLDR
The augmented dropout is proposed, which employs different dropping strategies in the forward and backward pass, to improve the standard dropout to provide new insight into this line of research.
...
1
2
3
...

References

SHOWING 1-10 OF 32 REFERENCES
Dropout distillation
TLDR
This work introduces a novel approach, coined "dropout distillation", that allows to train a predictor in a way to better approximate the intractable, but preferable, averaging process, while keeping under control its computational efficiency.
Variational Dropout and the Local Reparameterization Trick
TLDR
The Variational dropout method is proposed, a generalization of Gaussian dropout, but with a more flexibly parameterized posterior, often leading to better generalization in stochastic gradient variational Bayes.
Altitude Training: Strong Bounds for Single-Layer Dropout
TLDR
It is shown that, under a generative Poisson topic model with long documents, dropout training improves the exponent in the generalization bound for empirical risk minimization and should therefore induce minimal bias in high dimensions.
Fundamental differences between Dropout and Weight Decay in Deep Networks
TLDR
This work uncovers new properties of dropout, extends the understanding of why dropout succeeds, and lays the foundation for further progress on how dropout is insensitive to various re-scalings of the input features, outputs, and network weights.
On the inductive bias of dropout
TLDR
This paper continues the exploration of dropout as a regularizer pioneered by Wager, et.al.
Dropout as a Bayesian Approximation : Insights and Applications
TLDR
It is shown that a multilayer perceptron (MLP) with arbitrary depth and non-linearities, with dropout applied after every weight layer, is mathematically equivalent to an approximation to a well known Bayesian model.
Understanding Dropout
TLDR
A general formalism for studying dropout on either units or connections, with arbitrary probability values, is introduced and used to analyze the averaging and regularizing properties of dropout in both linear and non-linear networks.
Dropout Training as Adaptive Regularization
TLDR
By casting dropout as regularization, this work develops a natural semi-supervised algorithm that uses unlabeled data to create a better adaptive regularizer and consistently boosts the performance of dropout training, improving on state-of-the-art results on the IMDB reviews dataset.
To Drop or Not to Drop: Robustness, Consistency and Differential Privacy Properties of Dropout
TLDR
This paper rigorously shows that by randomly dropping a few nodes of a one-hidden layer neural network, the training objective function, up to a certain approximation error, decreases by a multiplicative factor and shows that dropout provides fast rates for generalization error in learning (convex) generalized linear models (GLM).
Dropout Training for Support Vector Machines
TLDR
An iteratively re-weighted least square (IRLS) algorithm is developed by exploring data augmentation techniques and offering insights on the connection and difference between the hinge loss and logistic loss in dropout training.
...
1
2
3
4
...