Improving Chinese Word Segmentation on Micro-blog Using Rich Punctuations

  title={Improving Chinese Word Segmentation on Micro-blog Using Rich Punctuations},
  author={Longkai Zhang and Li Li and Zhengyan He and Houfeng Wang and Ni Sun},
Micro-blog is a new kind of medium which is short and informal. While no segmented corpus of micro-blogs is available to train Chinese word segmentation model, existing Chinese word segmentation tools cannot perform equally well as in ordinary news texts. In this paper we present an effective yet simple approach to Chinese word segmentation of micro-blog. In our approach, we incorporate punctuation information of unlabeled micro-blog data by introducing characters behind or ahead of… CONTINUE READING
Highly Cited
This paper has 18 citations. REVIEW CITATIONS

Similar Papers

Loading similar papers…