A Chinese Word Segmentation System Based on Structured Support Vector Machine Utilization of Unlabeled Text Corpus

@inproceedings{Zhang2010ACW,
  title={A Chinese Word Segmentation System Based on Structured Support Vector Machine Utilization of Unlabeled Text Corpus},
  author={Chongyang Zhang and Zhigang Chen and Guoping Hu},
  booktitle={CIPS-SIGHAN},
  year={2010}
}
Character-based tagging method has achieved great success in Chinese Word Segmentation (CWS). This paper proposes a new approach to improve the CWS tagging accuracy by structured support vector machine (SVM) utilization of unlabeled text corpus. First, character N-grams in unlabeled text corpus are mapped into low-dimensional space by adopting SOM algorithm. Then new features extracted from these maps and another kind of feature based on entropy for each N-gram are integrated into the… CONTINUE READING
5 Citations
13 References
Similar Papers

References

Publications referenced by this paper.
Showing 1-10 of 13 references

CuttingPlane Training of Structural SVMs, Machine Learning Journal,77(1):27-59

  • T.Joachims, T.Finley, Chun-Nam Yu
  • 2009

Chinese word segmentation: A decade review

  • Chang-Ning Huang, Hai Zhao.
  • Journal of Chinese Information Processing, 21(3…
  • 2007
2 Excerpts

A Comparative Study on Chinese Word Clustering . Computer Processing of Oriental Languages

  • H. Wang
  • 2006
1 Excerpt

Similar Papers

Loading similar papers…