Controlling Global Statistics in Recurrent Neural Network Text Generation
@inproceedings{Noraset2018ControllingGS, title={Controlling Global Statistics in Recurrent Neural Network Text Generation}, author={Thanapon Noraset and David Demeter and Doug Downey}, booktitle={AAAI Conference on Artificial Intelligence}, year={2018} }
Recurrent neural network language models (RNNLMs) are an essential component for many language generation tasks such as machine translation, summarization, and automated conversation. Often, we would like to subject the text generated by the RNNLM to constraints, in order to overcome systemic errors (e.g. word repetition) or achieve application-specific goals (e.g. more positive sentiment). In this paper, we present a method for training RNNLMs to simultaneously optimize likelihood and…
6 Citations
Estimating Marginal Probabilities of n-grams for Recurrent Neural Language Models
- 2018
Computer Science
EMNLP
This paper studies how to compute an RNNLM’s em marginal probability: the probability that the model assigns to a short sequence of text when the preceding context is not known, and shows how to use the marginal estimation to improve an RnnLM by training the marginals to match n-gram probabilities from a larger corpus.
Language Modelling via Learning to Rank
- 2022
Computer Science
AAAI
It is shown that rank-based KD generally gives a modest improvement to perplexity (PPL) -- though often with statistical significance -- when compared to Kullback–Leibler-based DM, and a method using N-grams to create a non-probabilistic teacher which generates the ranks without the need of a pre-trained LM is developed.
Autoregressive Text Generation Beyond Feedback Loops
- 2019
Computer Science
EMNLP
This paper combines a latent state space model with a CRF observation model and argues that such autoregressive observation models form an interesting middle ground that expresses local correlations on the word level but keeps the state evolution non-autoregressive.
Multi-sense Definition Modeling using Word Sense Decompositions
- 2019
Computer Science
ArXiv
This paper introduces a new method that enables definition modeling for multiple senses of the same word, and shows how a Gumble-Softmax approach outperforms baselines at matching sense-specific embeddings to definitions during training.
Automatic diagnosis of sleep apnea from biomedical signals using artificial intelligence techniques: Methods, challenges, and future works
- 2022
Computer Science
WIREs Data Mining Knowl. Discov.
The challenges in the diagnosis of sleep apnea using AI methods are of paramount importance for researchers and these obstacles are elaborately addressed.
Inspecting and Directing Neural Language Models
- 2018
Psychology
Inspecting and Directing Neural Language Models and investigating and directing neural language models for reinforcement learning problems.
41 References
Regularizing and Optimizing LSTM Language Models
- 2018
Computer Science
ICLR
This paper proposes the weight-dropped LSTM which uses DropConnect on hidden-to-hidden weights as a form of recurrent regularization and introduces NT-ASGD, a variant of the averaged stochastic gradient method, wherein the averaging trigger is determined using a non-monotonic condition as opposed to being tuned by the user.
Sequence Level Training with Recurrent Neural Networks
- 2016
Computer Science
ICLR
This work proposes a novel sequence level training algorithm that directly optimizes the metric used at test time, such as BLEU or ROUGE, and outperforms several strong baselines for greedy generation.
Generating Sentences from a Continuous Space
- 2016
Computer Science
CoNLL
This work introduces and study an RNN-based variational autoencoder generative model that incorporates distributed latent representations of entire sentences that allows it to explicitly model holistic properties of sentences such as style, topic, and high-level syntactic features.
Exploring the Limits of Language Modeling
- 2016
Computer Science
ArXiv
This work explores recent advances in Recurrent Neural Networks for large scale Language Modeling, and extends current models to deal with two key challenges present in this task: corpora and vocabulary sizes, and complex, long term structure of language.
Sequence to Sequence Learning with Neural Networks
- 2014
Computer Science
NIPS
This paper presents a general end-to-end approach to sequence learning that makes minimal assumptions on the sequence structure, and finds that reversing the order of the words in all source sentences improved the LSTM's performance markedly, because doing so introduced many short term dependencies between the source and the target sentence which made the optimization problem easier.
An Actor-Critic Algorithm for Sequence Prediction
- 2017
Computer Science
ICLR
An approach to training neural networks to generate sequences using actor-critic methods from reinforcement learning (RL) that condition the critic network on the ground-truth output, and shows that this method leads to improved performance on both a synthetic task, and for German-English machine translation.
Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks
- 2015
Computer Science
NIPS
This work proposes a curriculum learning strategy to gently change the training process from a fully guided scheme using the true previous token, towards a less guided scheme which mostly uses the generated token instead.
On the State of the Art of Evaluation in Neural Language Models
- 2018
Computer Science
ICLR
This work reevaluate several popular architectures and regularisation methods with large-scale automatic black-box hyperparameter tuning and arrives at the somewhat surprising conclusion that standard LSTM architectures, when properly regularised, outperform more recent models.
A Deep Reinforced Model for Abstractive Summarization
- 2018
Computer Science
ICLR
A neural network model with a novel intra-attention that attends over the input and continuously generated output separately, and a new training method that combines standard supervised word prediction and reinforcement learning (RL) that produces higher quality summaries.
A Theoretically Grounded Application of Dropout in Recurrent Neural Networks
- 2016
Computer Science
NIPS
This work applies a new variational inference based dropout technique in LSTM and GRU models, which outperforms existing techniques, and to the best of the knowledge improves on the single model state-of-the-art in language modelling with the Penn Treebank.