Pushing the bounds of dropout
@article{Melis2018PushingTB, title={Pushing the bounds of dropout}, author={G{\'a}bor Melis and Charles Blundell and Tom{\'a}s Kocisk{\'y} and K. Hermann and Chris Dyer and P. Blunsom}, journal={ArXiv}, year={2018}, volume={abs/1805.09208} }
We show that dropout training is best understood as performing MAP estimation concurrently for a family of conditional models whose objectives are themselves lower bounded by the original dropout objective. This discovery allows us to pick any model from this family after training, which leads to a substantial improvement on regularisation-heavy language modelling. The family includes models that compute a power mean over the sampled dropout masks, and their less stochastic subvariants with… CONTINUE READING
Supplemental Code
Figures, Tables, and Topics from this paper
9 Citations
A Tutorial on Deep Latent Variable Models of Natural Language
- Computer Science, Mathematics
- ArXiv
- 2018
- 28
- PDF
Improving Neural Language Models by Segmenting, Attending, and Predicting the Future
- Computer Science
- ACL
- 2019
- 4
- PDF
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
- Computer Science, Mathematics
- ACL
- 2019
- 865
- PDF
References
SHOWING 1-10 OF 28 REFERENCES
Dropout with Expectation-linear Regularization
- Computer Science, Mathematics
- ICLR
- 2017
- 24
- Highly Influential
- PDF
An empirical analysis of dropout in piecewise linear networks
- Mathematics, Computer Science
- ICLR
- 2014
- 72
- PDF
Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning
- Mathematics, Computer Science
- ICML
- 2016
- 2,727
- Highly Influential
- PDF