Combination of Machine Learning Methods for Optimum Chinese Word Segmentation


This article presents our recent work for participation in the Second International Chinese Word Segmentation Bakeoff. Our system performs two procedures: Out-ofvocabulary extraction and word segmentation. We compose three out-of-vocabulary extraction modules: Character-based tagging with different classifiers – maximum entropy, support vector machines, and conditional random fields. We also compose three word segmentation modules – character-based tagging by maximum entropy classifier, maximum entropy markov model, and conditional random fields. All modules are based on previously proposed methods. We submitted three systems which are different combination of the modules.

Extracted Key Phrases

1 Figure or Table

Cite this paper

@inproceedings{Asahara2005CombinationOM, title={Combination of Machine Learning Methods for Optimum Chinese Word Segmentation}, author={Masayuki Asahara and Kenta Fukuoka and Ai Azuma and Chooi-Ling Goh and Yotaro Watanabe and Yuji Matsumoto and Takashi Tsuzuki}, year={2005} }