Beam Search Strategies for Neural Machine Translation

@inproceedings{Freitag2017BeamSS,
  title={Beam Search Strategies for Neural Machine Translation},
  author={Markus Freitag and Yaser Al-Onaizan},
  booktitle={NMT@ACL},
  year={2017}
}
The basic concept in Neural Machine Translation (NMT) is to train a large Neural Network that maximizes the translation performance on a given parallel corpus. NMT is then using a simple left-to-right beam-search decoder to generate new translations that approximately maximize the trained conditional probability. The current beam search strategy generates the target sentence word by word from left-to-right while keeping a fixed amount of active candidates at each time step. First, this simple… 

Figures and Tables from this paper

Incremental Beam Manipulation for Natural Language Generation

The performance of natural language generation systems has improved substantially with modern neural networks. At test time they typically employ beam search to avoid locally optimal but globally

Improving Arabic neural machine translation via n-best list re-ranking

A set of new re-ranking features that can be extracted directly from the parallel corpus without needing any external tools are proposed that takes into account lexical, syntactic, and even semantic aspects of the n-best list translations.

Exploring Recombination for Efficient Decoding of Neural Machine Translation

This work introduces recombination in NMT decoding based on the concept of the “equivalence” of partial hypotheses and uses a simple n-gram suffix based equivalence function to adapt it into beam search decoding and shows that the proposed method can obtain similar translation quality with a smaller beam size, making NMT decode more efficient.

Single-Queue Decoding for Neural Machine Translation

Despite its simplicity, it is shown that the proposed decoding algorithm is able to select hypotheses with better qualities and improve the translation performance.

Rethinking the Evaluation of Neural Machine Translation

This paper proposes a novel evaluation protocol, which not only avoids the effect of search errors but provides a system-level evaluation in the perspective of model ranking, based on the newly proposed exact top-k decoding instead of beam search.

Decoding Strategies for Neural Referring Expression Generation

It is found that most beam search heuristics developed for neural MT do not generalize well to referring expression generation (REG), and do not generally outperform greedy decoding, and a recent approach is explored, which uses a small network to modify the RNN’s hidden state for better decoding results.

Simplifying Encoder-Decoder-Based Neural Machine Translation Systems to Translate between Related Languages

The simplification of state-of-the-art sequence-to-sequence neural machine translation with attention is explored, resulting in the great simplification in the network, reducing the number of trainable parameters from 12.195 to 9.816.485 and the training time from 22h 53m to 12h 15m.

Using Context in Neural Machine Translation Training Objectives

It is demonstrated that training is more robust for document-level metrics than with sequence metrics, and improvements on NMT with TER and Grammatical Error Correction are demonstrated using GLEU, both metrics used at the document level for evaluations.

Leveraging Sentence Similarity in Natural Language Generation: Improving Beam Search using Range Voting

The proposed method can be applied when generating from any probabilistic language model, including n-gram models and neural network models, and it is found that the outputs generated using the proposed method are rated higher.

Empirical Analysis of Beam Search Performance Degradation in Neural Sequence Models

It is found that increasing the beam width leads to sequences that are disproportionately based on early, very low probability tokens that are followed by a sequence of tokens with higher (conditional) probability, and it is shown that such sequences are more likely to have a lower evaluation score than lower probability sequences without this pattern.
...

References

SHOWING 1-10 OF 25 REFERENCES

Improved beam search with constrained softmax for NMT

An improved beam search decoding algorithm with constrained softmax operations for neural machine translation (NMT) that translates about 117 words per second, beating the real-time translation requirements for practical MT systems.

Addressing the Rare Word Problem in Neural Machine Translation

This paper proposes and implements an effective technique to address the problem of end-to-end neural machine translation's inability to correctly translate very rare words, and is the first to surpass the best result achieved on a WMT’14 contest task.

Neural Machine Translation by Jointly Learning to Align and Translate

It is conjecture that the use of a fixed-length vector is a bottleneck in improving the performance of this basic encoder-decoder architecture, and it is proposed to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly.

On Using Very Large Target Vocabulary for Neural Machine Translation

It is shown that decoding can be efficiently done even with the model having a very large target vocabulary by selecting only a small subset of the whole target vocabulary.

Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation

GNMT, Google's Neural Machine Translation system, is presented, which attempts to address many of the weaknesses of conventional phrase-based translation systems and provides a good balance between the flexibility of "character"-delimited models and the efficiency of "word"-delicited models.

Neural Machine Translation of Rare Words with Subword Units

This paper introduces a simpler and more effective approach, making the NMT model capable of open-vocabulary translation by encoding rare and unknown words as sequences of subword units, and empirically shows that subword models improve over a back-off dictionary baseline for the WMT 15 translation tasks English-German and English-Russian by 1.3 BLEU.

Vocabulary Manipulation for Neural Machine Translation

This paper introduces a sentence-level or batch-level vocabulary, which is only a very small sub-set of the full output vocabulary for each sentence or batch, which reduces both the computing time and the memory usage of neural machine translation models.

Sequence to Sequence Learning with Neural Networks

This paper presents a general end-to-end approach to sequence learning that makes minimal assumptions on the sequence structure, and finds that reversing the order of the words in all source sentences improved the LSTM's performance markedly, because doing so introduced many short term dependencies between the source and the target sentence which made the optimization problem easier.

Recurrent Continuous Translation Models

We introduce a class of probabilistic continuous translation models called Recurrent Continuous Translation Models that are purely based on continuous representations for words, phrases and sentences

Sequence Transduction with Recurrent Neural Networks

This paper introduces an end-to-end, probabilistic sequence transduction system, based entirely on RNNs, that is in principle able to transform any input sequence into any finite, discrete output sequence.