Breaking the Beam Search Curse: A Study of (Re-)Scoring Methods and Stopping Criteria for Neural Machine Translation

@inproceedings{Yang2018BreakingTB,
  title={Breaking the Beam Search Curse: A Study of (Re-)Scoring Methods and Stopping Criteria for Neural Machine Translation},
  author={Yilin Yang and Liang Huang and M. Ma},
  booktitle={EMNLP},
  year={2018}
}
Beam search is widely used in neural machine translation, and usually improves translation quality compared to greedy search. It has been widely observed that, however, beam sizes larger than 5 hurt translation quality. We explain why this happens, and propose several methods to address this problem. Furthermore, we discuss the optimal stopping criteria for these methods. Results show that our hyperparameter-free methods outperform the widely-used hyperparameter-free heuristic of length… Expand
Learning to Stop in Structured Prediction for Neural Machine Translation
TLDR
A novel ranking method is proposed which enables an optimal beam search stop- ping criteria and a structured prediction loss function is introduced which penalizes suboptimal finished candidates produced by beam search during training. Expand
Correcting Length Bias in Neural Machine Translation
TLDR
It is shown that correcting the brevity problem almost eliminates the beam problem; some commonly-used methods for doing this are compared, finding that a simple per-word reward works well and a simple and quick way to tune this reward using the perceptron algorithm is introduced. Expand
Reducing Length Bias in Scoring Neural Machine Translation via a Causal Inference Method
“Neural machine translation (NMT) usually employs beam search to expand the searching spaceand obtain more translation candidates. However the increase of the beam size often suffersfrom plenty ofExpand
Rethinking the Evaluation of Neural Machine Translation
TLDR
This paper proposes a novel evaluation protocol, which not only avoids the effect of search errors but provides a system-level evaluation in the perspective of model ranking, based on the newly proposed exact top-k decoding instead of beam search. Expand
Checkpoint Reranking: An Approach to Select Better Hypothesis for Neural Machine Translation Systems
TLDR
A method of re-ranking the outputs of Neural Machine Translation (NMT) systems by solely focusing on the decoder’s ability to generate distinct tokens and without the usage of any language model or data is proposed. Expand
Investigating Label Bias in Beam Search for Open-ended Text Generation
TLDR
Empirical evidence is provided that label bias is a major reason for such degenerate behaviors of beam search and by combining locally normalized maximum likelihood estimation and globally normalized sequence-level training, label bias can be reduced with almost no sacrifice in perplexity. Expand
On NMT Search Errors and Model Errors: Cat Got Your Tongue?
TLDR
It is concluded that vanilla NMT in its current form requires just the right amount of beam search errors, which, from a modelling perspective, is a highly unsatisfactory conclusion indeed, as the model often prefers an empty translation. Expand
Empirical Analysis of Beam Search Performance Degradation in Neural Sequence Models
TLDR
It is found that increasing the beam width leads to sequences that are disproportionately based on early, very low probability tokens that are followed by a sequence of tokens with higher (conditional) probability, and it is shown that such sequences are more likely to have a lower evaluation score than lower probability sequences without this pattern. Expand
Revisiting the Weaknesses of Reinforcement Learning for Neural Machine Translation
TLDR
These experiments on in-domain and cross-domain adaptation reveal the importance of exploration and reward scaling, and provide empirical counter-evidence to Choshen et al. (2020) claims that policy gradient algorithms success is determined by the shape of output distributions rather than the reward. Expand
An in-depth Study of Neural Machine Translation Performance
With the rise of deep learning and rapidly increasing popularity of it, neural machine translation (NMT) has become one of the major research areas. Sequence-to-sequence models are widely used in NTMExpand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 20 REFERENCES
Correcting Length Bias in Neural Machine Translation
TLDR
It is shown that correcting the brevity problem almost eliminates the beam problem; some commonly-used methods for doing this are compared, finding that a simple per-word reward works well and a simple and quick way to tune this reward using the perceptron algorithm is introduced. Expand
When to Finish? Optimal Beam Search for Neural Text Generation (modulo beam size)
TLDR
A provably optimal beam search algorithm that will always return the optimal-score complete hypothesis (modulo beam size), and finish as soon as the optimality is established is proposed. Expand
Analyzing Uncertainty in Neural Machine Translation
TLDR
This study proposes tools and metrics to assess how uncertainty in the data is captured by the model distribution and how it affects search strategies that generate translations and shows that search works remarkably well but that models tend to spread too much probability mass over the hypothesis space. Expand
Sequence-to-Sequence Learning as Beam-Search Optimization
TLDR
This work introduces a model and beam-search training scheme, based on the work of Daume III and Marcu (2005), that extends seq2seq to learn global sequence scores and shows that this system outperforms a highly-optimized attention-basedseq2seq system and other baselines on three different sequence to sequence tasks: word ordering, parsing, and machine translation. Expand
Minimum Risk Training for Neural Machine Translation
TLDR
Experiments show that the proposed minimum risk training approach achieves significant improvements over maximum likelihood estimation on a state-of-the-art neural machine translation system across various languages pairs. Expand
Improved Neural Machine Translation with SMT Features
TLDR
The proposed method significantly improves the translation quality of the state-of-the-art NMT system on Chinese-to-English translation tasks and incorporates statistical machine translation (SMT) features, such as a translation model and an n-gram language model, with the NMT model under the log-linear framework. Expand
Neural Machine Translation with Reconstruction
TLDR
Experiments show that the proposed framework significantly improves the adequacy of NMT output and achieves superior translation result over state-of-the-art NMT and statistical MT systems. Expand
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
TLDR
GNMT, Google's Neural Machine Translation system, is presented, which attempts to address many of the weaknesses of conventional phrase-based translation systems and provides a good balance between the flexibility of "character"-delimited models and the efficiency of "word"-delicited models. Expand
Neural Machine Translation by Jointly Learning to Align and Translate
TLDR
It is conjecture that the use of a fixed-length vector is a bottleneck in improving the performance of this basic encoder-decoder architecture, and it is proposed to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly. Expand
Sequence Level Training with Recurrent Neural Networks
TLDR
This work proposes a novel sequence level training algorithm that directly optimizes the metric used at test time, such as BLEU or ROUGE, and outperforms several strong baselines for greedy generation. Expand
...
1
2
...