Experiments in Domain Adaptation for Statistical Machine Translation

Abstract

The special challenge of the WMT 2007 shared task was domain adaptation. We took this opportunity to experiment with various ways of adapting a statistical machine translation systems to a special domain (here: news commentary), when most of the training data is from a different domain (here: European Parliament speeches). This paper also gives a description of the submission of the University of Edinburgh to the shared task. 1 Our framework: the Moses MT system The open source Moses (Koehn et al., 2007) MT system was originally developed at the University of Edinburgh and received a major boost through a 2007 Johns Hopkins workshop. It is now used at several academic institutions as the basic infrastructure for statistical machine translation research. The Moses system is an implementation of the phrase-based machine translation approach (Koehn et al., 2003). In this approach, an input sentence is first split into text chunks (so-called phrases), which are then mapped one-to-one to target phrases using a large phrase translation table. Phrases may be reordered, but typically a reordering limit (in our experiments a maximum movement over 6 words) is used. See Figure 1 for an illustration. Phrase translation probabilities, reordering probabilities and language model probabilities are combined to give each possible sentence translation a score. The best-scoring translation is searched for by the decoding algorithm and outputted by the system as the best translation. The different system components hi (phrase translation probabilities, language Figure 1: Phrase-based statistical machine translation model: Input is split into text chunks (phrases) which are mapped using a large phrase translation table. Phrases are mapped one-to-one, and may be reordered. model, etc.) are combined in a log-linear model to obtain the score for the translation e for an input sentence f: score(e, f) = exp ∑

Extracted Key Phrases

4 Figures and Tables

0204020072008200920102011201220132014201520162017
Citations per Year

308 Citations

Semantic Scholar estimates that this publication has 308 citations based on the available data.

See our FAQ for additional information.

Cite this paper

@inproceedings{Koehn2007ExperimentsID, title={Experiments in Domain Adaptation for Statistical Machine Translation}, author={Philipp Koehn and Josh Schroeder}, booktitle={WMT@ACL}, year={2007} }