Vocabulary Extension via PoS Information for SMT

  title={Vocabulary Extension via PoS Information for SMT},
  author={Germ{\'a}n Sanchis and J. A. S{\'a}nchez},
One of the weaknesses of the socalled phrase based translation models is that they carry out a blind extraction of the phrase translation table, i.e., they do not take into account the linguistic information which is inherent to every language. On the other hand, Part of Speech (PoS) tagging is a problem that, nowadays, presents a pretty mature state of the art, obtaining error rates of almost 2%. Because of this, the use of automatically PoS-tagged corpora in Statistical Machine Translation… CONTINUE READING