Improving Text Normalization using Character-Blocks Based Models and System Combination

@inproceedings{Li2012ImprovingTN,
  title={Improving Text Normalization using Character-Blocks Based Models and System Combination},
  author={Chen Li and Yang Liu},
  booktitle={COLING},
  year={2012}
}
There are many abbreviation and non-standard tokens in SMS and Twitter messages. Normalizing these non-standard tokens will ease natural language processing modules for these domains. Recently, character-level machine translation (MT) and sequence labeling methods have been used for this normalization task, and demonstrated competitive performance. In this paper, we propose an approach to segment words into blocks of characters according to their phonetic symbols, and apply MT and sequence… CONTINUE READING
Highly Cited
This paper has 20 citations. REVIEW CITATIONS
15 Extracted Citations
22 Extracted References
Similar Papers

Similar Papers

Loading similar papers…