Improving Alignment for SMT by Reordering and Augmenting the Training Corpus

Abstract

We describe the LIU systems for EnglishGerman and German-English translation in the WMT09 shared task. We focus on two methods to improve the word alignment: (i) by applying Giza++ in a second phase to a reordered training corpus, where reordering is based on the alignments from the first phase, and (ii) by adding lexical data obtained as highprecision alignments from a different word aligner. These methods were studied in the context of a system that uses compound processing, a morphological sequence model for German, and a partof-speech sequence model for English. Both methods gave some improvements to translation quality as measured by Bleu and Meteor scores, though not consistently. All systems used both out-ofdomain and in-domain data as the mixed corpus had better scores in the baseline configuration.

Extracted Key Phrases

6 Figures and Tables

Cite this paper

@inproceedings{Holmqvist2009ImprovingAF, title={Improving Alignment for SMT by Reordering and Augmenting the Training Corpus}, author={Maria Holmqvist and Sara Stymne and Jody Foo and Lars Ahrenberg}, booktitle={WMT@EACL}, year={2009} }