Corpus ID: 212756

Regularizing and Optimizing LSTM Language Models

@article{Merity2018RegularizingAO,
  title={Regularizing and Optimizing LSTM Language Models},
  author={Stephen Merity and Nitish Shirish Keskar and Richard Socher},
  journal={ArXiv},
  year={2018},
  volume={abs/1708.02182}
}
Recurrent neural networks (RNNs), such as long short-term memory networks (LSTMs), serve as a fundamental building block for many sequence learning tasks, including machine translation, language modeling, and question answering. [...] Key Method Further, we introduce NT-ASGD, a variant of the averaged stochastic gradient method, wherein the averaging trigger is determined using a non-monotonic condition as opposed to being tuned by the user. Using these and other regularization strategies, we achieve state-of…Expand
Controlling Global Statistics in Recurrent Neural Network Text Generation
TLDR
A dynamic regularizer that updates as training proceeds, based on the generative behavior of the RNNLMs is presented, which improves model perplexity when the statistical constraints are n-gram statistics taken from a large corpus. Expand
TRANSFORMER-XL: LANGUAGE MODELING
  • 2018
We propose a novel neural architecture, Transformer-XL, for modeling longerterm dependency. To address the limitation of fixed-length contexts, we introduce a notion of recurrence by reusing theExpand
Pyramidal Recurrent Unit for Language Modeling
TLDR
The Pyramidal Recurrent Unit (PRU) is introduced, which enables learning representations in high dimensional space with more generalization power and fewer parameters, and outperforms all previous RNN models that exploit different gating mechanisms and transformations. Expand
Tree-Like Context Embeddings and Advisers for Neural Language Models
Statistical language models are key to many prominent applications in natural language processing. State-of-the-art language models are built around recurrent neural networks and form the basis forExpand
Improved Sentence Modeling using Suffix Bidirectional LSTM
TLDR
This work proposes a general and effective improvement to the BiLSTM model which encodes each suffix and prefix of a sequence of tokens in both forward and reverse directions, and introduces an alternate bias that favors long range dependencies. Expand
Regularized Training of Nearest Neighbor Language Models
TLDR
This paper builds upon kNNLM, which uses a pre-trained language model together with an exhaustive kNN search through the training data (memory bank) to achieve state-of-the-art results, and finds that the added L2 regularization seems to improve the performance for high-frequency words without deteriorating theperformance for low frequency ones. Expand
Memory Architectures in Recurrent Neural Network Language Models
TLDR
The results demonstrate the value of stack-structured memory for explaining the distribution of words in natural language, in line with linguistic theories claiming a context-free backbone for natural language. Expand
Multiplicative LSTM for sequence modelling
TLDR
It is demonstrated empirically that mLSTM outperforms standard LSTM and its deep variants for a range of character level language modelling tasks, and is argued makes it more expressive for autoregressive density estimation. Expand
DeFINE: DEep Factorized INput Word Embeddings for Neural Sequence Modeling
TLDR
A new method is described, DeFINE, for learning deep word-level representations efficiently, which uses a hierarchical structure with novel skip-connections which allows for the use of low dimensional input and output layers, reducing total parameters and training time while delivering similar or better performance versus existing methods. Expand
Language Models with Transformers
TLDR
This paper explores effective Transformer architectures for language model, including adding additional LSTM layers to better capture the sequential context while still keeping the computation efficient, and proposes Coordinate Architecture Search (CAS) to find an effective architecture through iterative refinement of the model. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 46 REFERENCES
Context dependent recurrent neural network language model
TLDR
This paper improves recurrent neural network language models performance by providing a contextual real-valued input vector in association with each word to convey contextual information about the sentence being modeled by performing Latent Dirichlet Allocation using a block of preceding text. Expand
Tying Word Vectors and Word Classifiers: A Loss Framework for Language Modeling
TLDR
This work introduces a novel theoretical framework that facilitates better learning in language modeling, and shows that this framework leads to tying together the input embedding and the output projection matrices, greatly reducing the number of trainable variables. Expand
Revisiting Activation Regularization for Language RNNs
TLDR
Traditional regularization techniques are revisited, specifically L2 regularization on RNN activations and slowness regularization over successive hidden states, to improve the performance of RNNs on the task of language modeling. Expand
Recurrent Highway Networks
TLDR
A novel theoretical analysis of recurrent networks based on Gersgorin's circle theorem is introduced that illuminates several modeling and optimization issues and improves the understanding of the LSTM cell. Expand
Character-Aware Neural Language Models
TLDR
A simple neural language model that relies only on character-level inputs that is able to encode, from characters only, both semantic and orthographic information and suggests that on many languages, character inputs are sufficient for language modeling. Expand
On the State of the Art of Evaluation in Neural Language Models
TLDR
This work reevaluate several popular architectures and regularisation methods with large-scale automatic black-box hyperparameter tuning and arrives at the somewhat surprising conclusion that standard LSTM architectures, when properly regularised, outperform more recent models. Expand
Neural Architecture Search with Reinforcement Learning
TLDR
This paper uses a recurrent network to generate the model descriptions of neural networks and trains this RNN with reinforcement learning to maximize the expected accuracy of the generated architectures on a validation set. Expand
Pointer Sentinel Mixture Models
TLDR
The pointer sentinel-LSTM model achieves state of the art language modeling performance on the Penn Treebank while using far fewer parameters than a standard softmax LSTM and the freely available WikiText corpus is introduced. Expand
Quasi-Recurrent Neural Networks
TLDR
Quasi-recurrent neural networks (QRNNs), an approach to neural sequence modeling that alternates convolutional layers, which apply in parallel across timesteps, and a minimalist recurrent pooling function that applies inallel across channels are introduced. Expand
Using the Output Embedding to Improve Language Models
TLDR
The topmost weight matrix of neural network language models is studied and it is shown that this matrix constitutes a valid word embedding and a new method of regularizing the output embedding is offered. Expand
...
1
2
3
4
5
...