Standardizing Tweets with Character-Level Machine Translation


This paper presents the results of the standardization procedure of Slovene tweets that are full of colloquial, dialectal and foreignlanguage elements. With the aim of minimizing the human input required we produced a manually normalized lexicon of the most salient out-ofvocabulary (OOV) tokens and used it to train a character-level statistical machine… (More)
DOI: 10.1007/978-3-642-54903-8_14


3 Figures and Tables


