Text normalization

Known as: N11n, Normalization, Text normalisation 
Text normalization is the process of transforming text into a single canonical form that it might not have had before. Normalizing text before… (More)
Wikipedia

Papers overview

Semantic Scholar uses AI to extract papers important to this topic.
2017
2017
Uyghur is the second largest and most actively used social media language in China. However, a non-negligible part of Uyghur text… (More)
  • table 1
  • table 2
  • table 3
  • figure 1
  • table 4
Is this relevant?
2016
2016
This paper presents a challenge to the community: given a large corpus of written text aligned to its normalized spoken form… (More)
  • figure 1
  • table 1
  • figure 2
  • figure 3
  • table 2
Is this relevant?
2014
2014
The informal nature of social media text renders it very difficult to be automatically processed by natural language processing… (More)
  • table 2
  • figure 1
  • table 4
  • figure 2
  • figure 3
Is this relevant?
Highly Cited
2013
Highly Cited
2013
We present a unified unsupervised statistical model for text normalization. The relationship between standard and non-standard… (More)
  • table 1
  • table 2
  • figure 1
  • table 3
Is this relevant?
2008
2008
This paper describes a process of text normalization system for the Bangla language (exonym: Bengali) by identifying the semiotic… (More)
  • figure 1
  • table 1
  • table 2
  • figure 2
Is this relevant?
Highly Cited
2006
Highly Cited
2006
Short Messaging Service (SMS) texts behave quite differently from normal written texts and have some very special phenomena. To… (More)
  • table 1
  • table 2
  • table 3
  • figure 1
  • table 4
Is this relevant?
Highly Cited
2006
Highly Cited
2006
While naive Bayes is quite effective in various data mining tasks, it shows a disappointing result in the automatic text… (More)
  • table 1
  • table 2
  • figure 1
  • figure 2
  • figure 3
Is this relevant?
Highly Cited
2002
Highly Cited
2002
Text normalization is an important aspect of successful information retrieval from medical documents such as clinical notes… (More)
  • table 1
  • table 2
  • table 3
  • table 3
  • table 5
Is this relevant?
Highly Cited
2001
Highly Cited
2001
In addition to ordinary words and names, real text contains non-standard “words” (NSWs), including numbers, abbreviations, dates… (More)
  • table I
  • table II
  • table III
  • table IV
  • table V
Is this relevant?
1997
1997
In this paper we present a quantitative investigation into the impact of text normalization on lexica and language models for… (More)
  • table 1
  • figure 1
  • figure 2
  • figure 3
  • figure 4
Is this relevant?