Corpus ID: 7182867

Dropout Training as Adaptive Regularization

@inproceedings{Wager2013DropoutTA,
  title={Dropout Training as Adaptive Regularization},
  author={S. Wager and Sida I. Wang and Percy Liang},
  booktitle={NIPS},
  year={2013}
}
  • S. Wager, Sida I. Wang, Percy Liang
  • Published in NIPS 2013
  • Computer Science, Mathematics
  • Dropout and other feature noising schemes control overfitting by artificially corrupting the training data. For generalized linear models, dropout performs a form of adaptive regularization. Using this viewpoint, we show that the dropout regularizer is first-order equivalent to an L2 regularizer applied after scaling the features by an estimate of the inverse diagonal Fisher information matrix. We also establish a connection to AdaGrad, an online learning algorithm, and find that a close… CONTINUE READING
    407 Citations

    Paper Mentions

    Dropout training for SVMs with data augmentation
    • 4
    • Highly Influenced
    • PDF
    Dropout Training for Support Vector Machines
    • 32
    • PDF
    Curriculum Dropout
    • 32
    • Highly Influenced
    • PDF
    The Implicit and Explicit Regularization Effects of Dropout
    • 14
    • Highly Influenced
    • PDF
    On the inductive bias of dropout
    • 42
    • Highly Influenced
    • PDF
    Dropout with Expectation-linear Regularization
    • 24
    • PDF
    Dropout Training, Data-dependent Regularization, and Generalization Bounds
    • 12
    • Highly Influenced
    • PDF
    An ETF view of Dropout regularization
    • 1
    • PDF
    Regularizing Deep Neural Networks by Noise: Its Interpretation and Optimization
    • 77
    • PDF
    Feature Noising for Log-Linear Structured Prediction
    • 22
    • PDF

    References

    SHOWING 1-10 OF 35 REFERENCES
    Fast dropout training
    • 320
    • PDF
    Feature Noising for Log-Linear Structured Prediction
    • 22
    • PDF
    Learning with Marginalized Corrupted Features
    • 136
    • Highly Influential
    • PDF
    Adaptive regularization of weight vectors
    • 329
    • PDF
    Training with Noise is Equivalent to Tikhonov Regularization
    • 806
    • Highly Influential
    • PDF
    Adding noise to the input of a model trained with a regularized objective
    • 58
    • PDF
    Semi-supervised Learning by Entropy Minimization
    • 838
    • PDF
    Adaptive Subgradient Methods for Online Learning and Stochastic Optimization
    • 6,437
    • PDF
    The Manifold Tangent Classifier
    • 226
    • PDF