Corpus ID: 43922508

Pushing the bounds of dropout

@article{Melis2018PushingTB,
  title={Pushing the bounds of dropout},
  author={G{\'a}bor Melis and Charles Blundell and Tom{\'a}s Kocisk{\'y} and K. Hermann and Chris Dyer and P. Blunsom},
  journal={ArXiv},
  year={2018},
  volume={abs/1805.09208}
}
  • Gábor Melis, Charles Blundell, +3 authors P. Blunsom
  • Published 2018
  • Computer Science, Mathematics
  • ArXiv
  • We show that dropout training is best understood as performing MAP estimation concurrently for a family of conditional models whose objectives are themselves lower bounded by the original dropout objective. This discovery allows us to pick any model from this family after training, which leads to a substantial improvement on regularisation-heavy language modelling. The family includes models that compute a power mean over the sampled dropout masks, and their less stochastic subvariants with… CONTINUE READING
    9 Citations

    Figures, Tables, and Topics from this paper

    Explore Further: Topics Discussed in This Paper

    Deep Latent Variable Models of Natural Language
    • PDF
    A Tutorial on Deep Latent Variable Models of Natural Language
    • 28
    • PDF
    Variational Smoothing in Recurrent Neural Network Language Models
    • 3
    • PDF
    Uncertainty in Neural Networks: Bayesian Ensembling
    • 43
    • PDF
    Mogrifier LSTM
    • 18
    • PDF
    TRANSFORMER-XL: LANGUAGE MODELING
    • 2018

    References

    SHOWING 1-10 OF 28 REFERENCES
    Dropout with Expectation-linear Regularization
    • 24
    • Highly Influential
    • PDF
    Concrete Dropout
    • 233
    • PDF
    Understanding Dropout
    • 285
    • PDF
    Variational Dropout and the Local Reparameterization Trick
    • 609
    Filtering Variational Objectives
    • 111
    • PDF
    An empirical analysis of dropout in piecewise linear networks
    • 72
    • PDF
    Fast dropout training
    • 322
    • PDF
    Auto-Encoding Variational Bayes
    • 10,907
    • PDF
    Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning
    • 2,727
    • Highly Influential
    • PDF
    Fraternal Dropout
    • 22
    • PDF