Empirical Methods for Compound Splitting


Compounded words are a challenge for NLP applications such as machine translation (MT). We introduce methods to learn splitting rules from monolingual and parallel corpora. We evaluate them against a gold standard and measure their impact on performance of statistical MT systems. Results show accuracy of 99.1% and performance gains for MT of 0.039 BLEU on a German-English noun phrase translation task.

Extracted Key Phrases

Citations per Year

374 Citations

Semantic Scholar estimates that this publication has 374 citations based on the available data.

See our FAQ for additional information.

Cite this paper

@inproceedings{Koehn2003EmpiricalMF, title={Empirical Methods for Compound Splitting}, author={Philipp Koehn and Kevin Knight}, booktitle={EACL}, year={2003} }