The Edit Distance Transducer in Action: The University of Cambridge English-German System at WMT16

@inproceedings{Stahlberg2016TheED,
  title={The Edit Distance Transducer in Action: The University of Cambridge English-German System at WMT16},
  author={Felix Stahlberg and Eva Hasler and Bill Byrne},
  booktitle={WMT},
  year={2016}
}
This paper presents the University of Cambridge submission to WMT16. Motivated by the complementary nature of syntactical machine translation and neural machine translation (NMT), we exploit the synergies of Hiero and NMT in different combination schemes. Starting out with a simple neural lattice rescoring approach, we show that the Hiero lattices are often too narrow for NMT ensembles. Therefore, instead of a hard restriction of the NMT search space to the lattice, we propose to loosely couple… 

Figures and Tables from this paper

Neural Machine Translation by Minimising the Bayes-risk with Respect to Syntactic Translation Lattices

An efficient and simple way to integrate risk estimation into the NMT decoder which is suitable for word-level as well as subword-unit-level NMT and produces entirely new hypotheses far beyond simply rescoring the SMT search space or fixing UNKs in the N MT output.

Neural Machine Translation: A Review and Survey

This work traces back the origins of modern NMT architectures to word and sentence embeddings and earlier examples of the encoder-decoder network family and concludes with a survey of recent trends in the field.

Neural Machine Translation: A Review

This work traces back the origins of modern NMT architectures to word and sentence embeddings and earlier examples of the encoder-decoder network family and concludes with a survey of recent trends in the field.

The Roles of Language Models and Hierarchical Models in Neural Sequence-to-Sequence Prediction

It is shown how traditional symbolic statistical machine translation models can still improve neural machine translation while reducing the risk of common pathologies of NMT such as hallucinations and neologisms.

LMS

Gains are reported by finetuning very strong baselines on former WMT test sets using a combination of checkpoint averaging and EWC and extracting n-gram probabilities from SMT lattices which can be seen as a source-conditioned ngram LM.

Unsupervised Pretraining for Sequence to Sequence Learning

This work presents a general unsupervised learning method to improve the accuracy of sequence to sequence (seq2seq) models by pretraining the weights of the encoder and decoder with the pretrained weights of two language models and then fine-tuned with labeled data.

From Feature to Paradigm: Deep Learning in Machine Translation (Extended Abstract)

This extended abstract focuses on describing the foundational works on the neural MT approach; mentioning its strengths and weaknesses; and including an analysis of the corresponding challenges and future work.

From Feature To Paradigm: Deep Learning In Machine Translation

The new neural MT approach is reported together with a description of the foundational related works and recent approaches on using subword, characters and training with multilingual languages, among others.

Findings of the 2016 Conference on Machine Translation

This paper presents the results of the WMT16 shared tasks, which included five machine translation (MT) tasks (standard news, IT-domain, biomedical, multimodal, pronoun), three evaluation tasks

Findings of the 2016 Conference on Machine Translation (WMT16)

This paper presents the results of the WMT16 shared tasks, which included five machine translation (MT) tasks (standard news, IT-domain, biomedical, multimodal, pronoun), three evaluation tasks

References

SHOWING 1-10 OF 37 REFERENCES

Syntactically Guided Neural Machine Translation

With a slightly modified NMT beam-search decoder, this work finds gains over both Hiero and NMT decoding alone, with practical advantages in extending NMT to very large input and output vocabularies.

Hierarchical Phrase-Based Translation with Weighted Finite-State Transducers and Shallow-n Grammars

HiFST, a lattice-based decoder for hierarchical phrase-based translation and alignment is described, finding that the use of WFSTs rather than k-best lists requires less pruning in translation search, resulting in fewer search errors, better parameter optimization, and improved translation performance.

Neural Reranking Improves Subjective Quality of Machine Translation: NAIST at WAT2015

A detailed analysis of reasons for this increase in neural MT reranking is performed, finding that the main contributions of the neural models lie in improvement of the grammatical correctness of the output, as opposed to improvements in lexical choice of content words.

Neural Machine Translation of Rare Words with Subword Units

This paper introduces a simpler and more effective approach, making the NMT model capable of open-vocabulary translation by encoding rare and unknown words as sequences of subword units, and empirically shows that subword models improve over a back-off dictionary baseline for the WMT 15 translation tasks English-German and English-Russian by 1.3 BLEU.

Achieving Open Vocabulary Neural Machine Translation with Hybrid Word-Character Models

A novel word-character solution to achieving open vocabulary NMT that can successfully learn to not only generate well-formed words for Czech, a highly-inflected language with a very complex vocabulary, but also build correct representations for English source words.

Character-based Neural Machine Translation

A neural MT system using character-based embeddings in combination with convolutional and highway layers to replace the standard lookup-based word representations to provide improved results even when the source language is not morphologically rich is proposed.

Recurrent Continuous Translation Models

We introduce a class of probabilistic continuous translation models called Recurrent Continuous Translation Models that are purely based on continuous representations for words, phrases and sentences

Variable-Length Word Encodings for Neural Translation Models

This work proposes and compares three variable-length encoding schemes that represent a large vocabulary corpus using a much smaller vocabulary with no loss in information and improves WMT English-French translation performance by up to 1.7 BLEU.

Neural Machine Translation by Jointly Learning to Align and Translate

It is conjecture that the use of a fixed-length vector is a bottleneck in improving the performance of this basic encoder-decoder architecture, and it is proposed to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly.

Improving Neural Machine Translation Models with Monolingual Data

This work pairs monolingual training data with an automatic back-translation, and can treat it as additional parallel training data, and obtains substantial improvements on the WMT 15 task English German, and for the low-resourced IWSLT 14 task Turkish->English.