Experiments in Domain Adaptation for Statistical Machine Translation


The special challenge of the WMT 2007 shared task was domain adaptation. We took this opportunity to experiment with various ways of adapting a statistical machine translation systems to a special domain (here: news commentary), when most of the training data is from a different domain (here: European Parliament speeches). This paper also gives a description of the submission of the University of Edinburgh to the shared task. The open source Moses (Koehn et al., 2007) MT system was originally developed at the University of Edinburgh and received a major boost through a 2007 Johns Hopkins workshop. It is now used at several academic institutions as the basic infrastructure for statistical machine translation research. The Moses system is an implementation of the phrase-based machine translation approach (Koehn et al., 2003). In this approach, an input sentence is first split into text chunks (so-called phrases), which are then mapped one-to-one to target phrases using a large phrase translation table. Phrases may be reordered , but typically a reordering limit (in our experiments a maximum movement over 6 words) is used. See Figure 1 for an illustration. Phrase translation probabilities, reordering probabilities and language model probabilities are combined to give each possible sentence translation a score. The best-scoring translation is searched for by the decoding algorithm and outputted by the system as the best translation. The different system components h i (phrase translation probabilities, language Figure 1: Phrase-based statistical machine translation model: Input is split into text chunks (phrases) which are mapped using a large phrase translation table. Phrases are mapped one-to-one, and may be reordered. model, etc.) are combined in a log-linear model to obtain the score for the translation e for an input sentence f: score(e, f) = exp i λ i h i (e, f) (1) The weights of the components λ i are set by a discriminative training method on held-out development data (Och, 2003). The basic components used in our experiments are: (a) two phrase translation probabilities (both p(e|f) and p(f |e)), (b) two word translation probabilities (both p(e|f) and p(f |e)), (c) phrase count, (d) output word count, (e) language model, (f) distance-based reordering model, and (g) lexicalized reordering model. For a more detailed description of this model, please refer to (Koehn et al., 2005). Since training data for statistical machine translation is typically collected opportunistically from wherever it is available, the application domain for a …

Extracted Key Phrases

4 Figures and Tables


Citations per Year

261 Citations

Semantic Scholar estimates that this publication has received between 217 and 321 citations based on the available data.

See our FAQ for additional information.

Cite this paper

@inproceedings{Koehn2007ExperimentsID, title={Experiments in Domain Adaptation for Statistical Machine Translation}, author={Philipp Koehn and Josh Schroeder}, year={2007} }