Chinese Word Segmentation as Character Tagging

@article{Xu2003ChineseWS,
  title={Chinese Word Segmentation as Character Tagging},
  author={Nianwen Xu},
  journal={IJCLCLP},
  year={2003},
  volume={8}
}
In this paper we report results of a supervised machine-learning approach to Chinese word segmentation. A maximum entropy tagger is trained on manually annotated data to automatically assign to Chinese characters, or hanzi, tags that indicate the position of a hanzi within a word. The tagged output is then converted into segmented text for evaluation. Preliminary results show that this approach is competitive against other supervised machine-learning segmenters reported in previous studies… CONTINUE READING
Highly Influential
This paper has highly influenced 24 other papers. REVIEW HIGHLY INFLUENTIAL CITATIONS
Highly Cited
This paper has 387 citations. REVIEW CITATIONS

From This Paper

Figures, tables, results, and topics from this paper.

Key Quantitative Results

  • Preliminary results show that this approach is competitive against other supervised machine-learning segmenters reported in previous studies, achieving precision and recall rates of 95.01% and 94.94% respectively, trained on a 237K-word training set.

Explore Further: Topics Discussed in This Paper

Citations

Publications citing this paper.
Showing 1-10 of 249 extracted citations

Survey: Finite-state technology in natural language processing

Theor. Comput. Sci. • 2017
View 10 Excerpts
Highly Influenced

A Simple and Effective Neural Model for Joint Word Segmentation and POS Tagging

IEEE/ACM Transactions on Audio, Speech, and Language Processing • 2018
View 10 Excerpts
Highly Influenced

Natural Language Processing and Chinese Computing

Lecture Notes in Computer Science • 2017
View 12 Excerpts
Highly Influenced

Natural Language Processing and Chinese Computing

Communications in Computer and Information Science • 2012
View 10 Excerpts
Highly Influenced

388 Citations

02040'03'06'10'14'18
Citations per Year
Semantic Scholar estimates that this publication has 388 citations based on the available data.

See our FAQ for additional information.

References

Publications referenced by this paper.
Showing 1-10 of 25 references

A statistical method for finding word boundaries in Chinese text

R. Sproat, C. L. Shih
Computer Processing of Chinese and Oriental Languages • 1990
View 5 Excerpts
Highly Influenced

The Morphology of Chinese: A Linguistics and Cognitive Approach

Packard, Jerome
2000

The Segmentation Guidelines for Chinese Treebank Project

Xia, Fei
Technical Report IRCS 00-06, • 2000

Dicovering Chinese words from unsegmented text

Ge, Xianping, Wanda Pratt, Padhraic Smyth
1999

Similar Papers

Loading similar papers…