Uzbek-English and Turkish-English Morpheme Alignment Corpora

  title={Uzbek-English and Turkish-English Morpheme Alignment Corpora},
  author={Xuansong Li and Jennifer Tracey and Stephen Grimes and Stephanie Strassel},
Morphologically-rich languages pose problems for machine translation (MT) systems, including word-alignment errors, data sparsity and multiple affixes. Current alignment models at word-level do not distinguish words and morphemes, thus yielding low-quality alignment and subsequently affecting end translation quality. Models using morpheme-level alignment can reduce the vocabulary size of morphologically-rich languages and overcomes data sparsity. The alignment data based on smallest units… CONTINUE READING