Empirical Methods for Compound Splitting

Abstract

Compounded words are a challenge for NLP applications such as machine translation (MT). We introduce methods to learn splitting rules from monolingual and parallel corpora. We evaluate them against a gold standard and measure their impact on performance of statistical MT systems. Results show accuracy of 99.1% and performance gains for MT of 0.039 BLEU on a German-English noun phrase translation task.

Extracted Key Phrases

Showing 1-10 of 231 extracted citations
02040'03'05'07'09'11'13'15'17
Citations per Year

325 Citations

Semantic Scholar estimates that this publication has received between 274 and 392 citations based on the available data.

See our FAQ for additional information.