A Simple but Effective Approach to Improve Arabizi-to-English Statistical Machine Translation


A major challenge for statistical machine translation (SMT) of Arabic-to-English user-generated text is the prevalence of text written in Arabizi, or Romanized Arabic. When facing such texts, a translation system trained on conventional Arabic-English data will suffer from extremely low model coverage. In addition, Arabizi is not regulated by any official… (More)


6 Figures and Tables