A Character-Based Joint Model for Chinese Word Segmentation

Abstract

The character-based tagging approach is a dominant technique for Chinese word segmentation, and both discriminative and generative models can be adopted in that framework. However, generative and discriminative character-based approaches are significantly different and complement each other. A simple joint model combining the character-based generative model and the discriminative one is thus proposed in this paper to take advantage of both approaches. Experiments on the Second SIGHAN Bakeoff show that this joint approach achieves 21% relative error reduction over the discriminative model and 14% over the generative one. In addition, closed tests also show that the proposed joint model outperforms all the existing approaches reported in the literature and achieves the best Fscore in four out of five corpora.

Extracted Key Phrases

8 Figures and Tables

Cite this paper

@inproceedings{Wang2010ACJ, title={A Character-Based Joint Model for Chinese Word Segmentation}, author={Kun Wang and Chengqing Zong and Keh-Yih Su}, booktitle={COLING}, year={2010} }