Consistency of a Recurrent Language Model with Respect to Incomplete Decoding

@inproceedings{Welleck2020ConsistencyOA,
  title={Consistency of a Recurrent Language Model with Respect to Incomplete Decoding},
  author={Sean Welleck and Ilia Kulikov and Jaedeok Kim and Richard Yuanzhe Pang and Kyunghyun Cho},
  booktitle={EMNLP},
  year={2020}
}
Despite strong performance on a variety of tasks, neural sequence models trained with maximum likelihood have been shown to exhibit issues such as length bias and degenerate repetition. We study the related issue of receiving infinite-length sequences from a recurrent language model when using common decoding algorithms. To analyze this issue, we first define inconsistency of a decoding algorithm, meaning that the algorithm can yield an infinite-length sequence that has zero probability under… Expand
De-Anonymizing Text by Fingerprinting Language Generation
TLDR
The study of code security of ML systems is initiated by investigating how nucleus sampling unwittingly leaks texts typed by users, finding that the series of nucleus sizes for many natural English word sequences is a unique fingerprint. Expand
A Systematic Characterization of Sampling Algorithms for Open-ended Language Generation
TLDR
This work uses the quality-diversity (Q-D) trade-off to investigate three popular sampling methods (top-k, nucleus and tempered sampling), and designs two sets of new sampling methods that satisfy three key properties: entropy reduction, order preservation, and slope preservation. Expand
Text Generation by Learning from Demonstrations
TLDR
It is found that GOLD outperforms the baselines according to automatic and human evaluation on summarization, question generation, and machine translation, including attaining state-of-the-art results for CNN/DailyMail summarization. Expand
A Task-Oriented Dialogue Architecture via Transformer Neural Language Models and Symbolic Injection
Recently, transformer language models have been applied to build both task- and non-task-oriented dialogue systems. Although transformers perform well on most of the NLP tasks, they perform poorly onExpand
MAUVE: Human-Machine Divergence Curves for Evaluating Open-Ended Text Generation
TLDR
MAUVE is a metric for open-ended text generation, which directly compares the distribution of machine-generated text to that of human language, and shows that evaluation under MAUVE reflects the more natural behavior with respect to model size, compared to prior metrics. Expand
Controllable Neural Natural Language Generation: comparison of state-of-the-art control strategies
Most NLG systems target text fluency and grammatical correctness, disregarding control over text structure and length. However, control over the output plays an important part in industrial NLGExpand
MLE-guided parameter search for task loss minimization in neural sequence modeling
TLDR
This paper proposes maximum likelihood guided parameter search (MGS), which samples from a distribution over update directions that is a mixture of random search around the current parameters and around the maximum likelihood gradient, with each direction weighted by its improvement in the task loss. Expand
Mode recovery in neural autoregressive sequence modeling
TLDR
It is concluded that future research must consider the entire learning chain of the ground-truth, empirical, learned and decoding-induced distributions in order to fully understand the potentials and perils and to further improve neural autoregressive sequence models. Expand
An Information Divergence Measure Between Neural Text and Human Text
TLDR
Mauve is proposed, a comparison measure for open-ended text generation, which directly compares a generation model’s distribution to that of human-written text, and identifies known properties of generated text, scales naturally with model size, and correlates with human judgments, with fewer restrictions than existing distributional evaluation metrics. Expand
Extractive and Abstractive Explanations for Fact-Checking and Evaluation of News
TLDR
This paper investigates the construction of natural language explanations for news claims with the goal of assisting fact-checking and news evaluation applications, and finds that the extractive method shows the most promise. Expand
...
1
2
...

References

SHOWING 1-10 OF 30 REFERENCES
Length bias in Encoder Decoder Models and a Case for Global Conditioning
TLDR
This paper shows that a globally conditioned model alleviates the above problems of encoder-decoder models, and eliminates the need for a beam-search during inference, which reduces to an efficient dot-product based search in a vector-space. Expand
Regularizing and Optimizing LSTM Language Models
TLDR
This paper proposes the weight-dropped LSTM which uses DropConnect on hidden-to-hidden weights as a form of recurrent regularization and introduces NT-ASGD, a variant of the averaged stochastic gradient method, wherein the averaging trigger is determined using a non-monotonic condition as opposed to being tuned by the user. Expand
Neural Text Generation with Unlikelihood Training
TLDR
It is shown that the likelihood objective itself is at fault, resulting in a model that assigns too much probability to sequences containing repeats and frequent words, unlike those from the human training distribution, thus providing a strong alternative to existing techniques. Expand
Recurrent Neural Networks as Weighted Language Recognizers
TLDR
It is shown that approximations and heuristic algorithms are necessary in practical applications of single-layer, ReLU-activation, rational-weight RNNs with softmax, which are commonly used in natural language processing applications. Expand
Generating Text with Recurrent Neural Networks
TLDR
The power of RNNs trained with the new Hessian-Free optimizer by applying them to character-level language modeling tasks is demonstrated, and a new RNN variant that uses multiplicative connections which allow the current input character to determine the transition matrix from one hidden state vector to the next is introduced. Expand
Sequence Level Training with Recurrent Neural Networks
TLDR
This work proposes a novel sequence level training algorithm that directly optimizes the metric used at test time, such as BLEU or ROUGE, and outperforms several strong baselines for greedy generation. Expand
The Curious Case of Neural Text Degeneration
TLDR
By sampling text from the dynamic nucleus of the probability distribution, which allows for diversity while effectively truncating the less reliable tail of the distribution, the resulting text better demonstrates the quality of human text, yielding enhanced diversity without sacrificing fluency and coherence. Expand
Attention is All you Need
TLDR
A new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely is proposed, which generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data. Expand
An Actor-Critic Algorithm for Sequence Prediction
TLDR
An approach to training neural networks to generate sequences using actor-critic methods from reinforcement learning (RL) that condition the critic network on the ground-truth output, and shows that this method leads to improved performance on both a synthetic task, and for German-English machine translation. Expand
Neural Machine Translation by Jointly Learning to Align and Translate
TLDR
It is conjecture that the use of a fixed-length vector is a bottleneck in improving the performance of this basic encoder-decoder architecture, and it is proposed to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly. Expand
...
1
2
3
...