Standardizing Tweets with Character-Level Machine Translation

Abstract

This paper presents the results of the standardization procedure of Slovene tweets that are full of colloquial, dialectal and foreignlanguage elements. With the aim of minimizing the human input required we produced a manually normalized lexicon of the most salient out-ofvocabulary (OOV) tokens and used it to train a character-level statistical machine… (More)
DOI: 10.1007/978-3-642-54903-8_14

Topics

3 Figures and Tables

Statistics

02040201520162017
Citations per Year

Citation Velocity: 9

Averaging 9 citations per year over the last 3 years.

Learn more about how we calculate this metric in our FAQ.

Slides referencing similar topics