Learn More
We describe an open-source toolkit for statistical machine translation whose novel contributions are (a) support for linguistically motivated factors, (b) confusion network decoding, and (c) efficient data formats for translation models and language models. In addition to the SMT decoder, the toolkit also includes a wide variety of tools for training,(More)
The 2006 Language Engineering Workshop Open Source Toolkit for Statistical Machine Translation had the objective to advance the current state-of-the-art in statistical machine translation through richer input and richer annotation of the training data. The workshop focused on three topics: factored translation models, confusion network decoding, and the(More)
This paper describes advances in the use of confusion networks as interface between automatic speech recognition and machine translation. In particular, it presents an implementation of a confusion network decoder which significantly improves both in efficiency and performance previous work along this direction. The confusion network decoder results as an(More)
Domain adaptation has recently gained interest in statistical machine translation to cope with the performance drop observed when testing conditions deviate from training conditions. The basic idea is that in-domain training data can be exploited to adapt all components of an already developed system. Previous work showed small performance gains by adapting(More)
This paper investigates the impact of misspelled words in statistical machine translation and proposes an extension of the translation engine for handling misspellings. The enhanced system decodes a word-based confusion network representing spelling variations of the input text. We present extensive experimental results on two translation tasks of(More)
—This paper describes an approach for computing a consensus translation from the outputs of multiple machine translation (MT) systems. The consensus translation is computed by weighted majority voting on a confusion network, similarly to the well-established ROVER approach of Fiscus for combining speech recognition hypotheses. To create the confusion(More)
We describe an open-source soware for minimum error rate training (MERT) for statistical machine translation (SMT). is was implemented within the Moses toolkit, although it is essentially standsalone, with the aim of replacing the existing implementation with a cleaner, more flexible design, in order to facilitate further research in weight optimisation.(More)