Corpus ID: 10482362

Revisiting Activation Regularization for Language RNNs

  title={Revisiting Activation Regularization for Language RNNs},
  author={Stephen Merity and B. McCann and R. Socher},
  • Stephen Merity, B. McCann, R. Socher
  • Published 2017
  • Computer Science
  • ArXiv
  • Recurrent neural networks (RNNs) serve as a fundamental building block for many sequence tasks across natural language processing. Recent research has focused on recurrent dropout techniques or custom RNN cells in order to improve performance. Both of these can require substantial modifications to the machine learning model or to the underlying RNN configurations. We revisit traditional regularization techniques, specifically L2 regularization on RNN activations and slowness regularization over… CONTINUE READING
    26 Citations
    Regularizing and Optimizing LSTM Language Models
    • 619
    • PDF
    Fraternal Dropout
    • 21
    • Highly Influenced
    • PDF
    Improving Neural Language Models with Weight Norm Initialization and Regularization
    • 4
    • Highly Influenced
    • PDF
    Variational Bi-LSTMs
    • 12
    • PDF
    Adversarial Dropout for Recurrent Neural Networks
    • 2
    • Highly Influenced
    • PDF
    Learning Architectures from an Extended Search Space for Language Modeling
    • 5
    • Highly Influenced
    • PDF
    Tailoring an Interpretable Neural Language Model
    • 1
    Highway State Gating for Recurrent Highway Networks: Improving Information Flow Through Time
    Improving Image Captioning with Language Modeling Regularizations


    Regularizing RNNs by Stabilizing Activations
    • 63
    • PDF
    Recurrent Neural Network Regularization
    • 1,554
    • PDF
    Recurrent Dropout without Memory Loss
    • 146
    • PDF
    Quasi-Recurrent Neural Networks
    • 224
    • PDF
    Zoneout: Regularizing RNNs by Randomly Preserving Hidden Activations
    • 214
    • PDF
    Recurrent Highway Networks
    • 309
    • Highly Influential
    • PDF
    Recurrent neural network based language model
    • 4,072
    • PDF
    Tying Word Vectors and Word Classifiers: A Loss Framework for Language Modeling
    • 270
    • PDF
    A Theoretically Grounded Application of Dropout in Recurrent Neural Networks
    • 1,038
    • PDF