Statistical Phrase-Based Translation

Abstract

We propose a new phrase-based translation model and decoding algorithm that enables us to evaluate and compare several, previously proposed phrase-based translation models. Within our framework, we carry out a large number of experiments to understand better and explain why phrase-based models outperform word-based models. Our empirical results, which hold for all examined language pairs, suggest that the highest levels of performance can be obtained through relatively simple means: heuristic learning of phrase translations from word-based alignments and lexical weighting of phrase translations. Surprisingly, learning phrases longer than three words and learning phrases from high-accuracy wordlevel alignment models does not have a strong impact on performance. Learning only syntactically motivated phrases degrades the performance of our systems.

Extracted Key Phrases

8 Figures and Tables

0100200300'04'06'08'10'12'14'16
Citations per Year

3,380 Citations

Semantic Scholar estimates that this publication has 3,380 citations based on the available data.

See our FAQ for additional information.

Cite this paper

@inproceedings{Koehn2003StatisticalPT, title={Statistical Phrase-Based Translation}, author={Philipp Koehn and Franz Josef Och and Daniel Marcu}, booktitle={HLT-NAACL}, year={2003} }