Normalizing tweets with edit scripts and recurrent neural embeddings

  title={Normalizing tweets with edit scripts and recurrent neural embeddings},
  author={Grzegorz Chrupala},
Tweets often contain a large proportion of abbreviations, alternative spellings, novel words and other non-canonical language. These features are problematic for standard language analysis tools and it can be desirable to convert them to canonical form. We propose a novel text normalization model based on learning edit operations from labeled data while incorporating features induced from unlabeled data via character-level neural text embeddings. The text embeddings are generated using an… CONTINUE READING
Highly Cited
This paper has 53 citations. REVIEW CITATIONS

From This Paper

Figures, tables, and topics from this paper.
39 Citations
25 References
Similar Papers


Publications citing this paper.

54 Citations

Citations per Year
Semantic Scholar estimates that this publication has 54 citations based on the available data.

See our FAQ for additional information.

Similar Papers

Loading similar papers…