Chinese Work Segmentation without Using Lexicon and Hand-crafted Training Data

@inproceedings{Sun1998ChineseWS,
  title={Chinese Work Segmentation without Using Lexicon and Hand-crafted Training Data},
  author={Maosong Sun and Dayang Shen and Benjamin Ka-Yin T'sou},
  booktitle={COLING-ACL},
  year={1998}
}
Chinese word segmentation is the first step in any Chinese NLP system. This paper presents a new algorithm for segmenting Chinese texts without making use of any lexicon and hand-crafted linguistic resource. The statistical data required by the algorithm, that is, mutual information and the difference of t-score between characters, is derived automatically from raw Chinese corpora. The preliminary experiment shows that the segmentation accuracy of our algorithm is acceptable. We hope the… Expand
Building Chinese Lexicons from Scratch by Unsupervised Short Document Self-Segmentation
Unsupervised Segmentation of Chinese Corpus Using Accessor Variety
Word Frequency Approximation for Chinese Using Raw, MM-Segmented and Manually Segmented Corpora
A Local Generative Model for Chinese Word Segmentation
Binary Tree based Chinese Word Segmentation
Chinese Text Classification without Automatic Word Segmentation
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 17 REFERENCES
A Trainable Rule-based Algorithm for Word Segmentation
Automatic Word Identification in Chinese Sentences by the Relaxation Technique
  • Computer Processing of Chinese & Oriental Languages,
  • 1988
CDWS: An Automatic Word Segmentation System for Written Chinese Texts
  • Journal of Chinese Information Processing,
  • 1987
CDWS: An Automatic Word Segmentation System for Written Chinese Texts
  • Chines'e Information Processing,
  • 1987
Computer processing of Chinese & Oriental languages
...
1
2
...