Standardizing Tweets with Character-Level Machine Translation


This paper presents the results of the standardization procedure of Slovene tweets that are full of colloquial, dialectal and foreignlanguage elements. With the aim of minimizing the human input required we produced a manually normalized lexicon of the most salient out-ofvocabulary (OOV) tokens and used it to train a character-level statistical machine… (More)
DOI: 10.1007/978-3-642-54903-8_14


3 Figures and Tables


Citations per Year

Citation Velocity: 9

Averaging 9 citations per year over the last 3 years.

Learn more about how we calculate this metric in our FAQ.

Slides referencing similar topics