Automatic extraction of new words based on Google News corpora for supporting lexicon-based Chinese word segmentation systems

@article{Hong2009AutomaticEO,
  title={Automatic extraction of new words based on Google News corpora for supporting lexicon-based Chinese word segmentation systems},
  author={Chin-Ming Hong and Chih-Ming Chen and Chao-Yang Chiu},
  journal={Expert Syst. Appl.},
  year={2009},
  volume={36},
  pages={3641-3651}
}
Chinese word segmentation is an essential step in a processing of Chinese natural language because it is beneficial to the Chinese text mining and information retrieval. Currently, the lexicon-based Chinese word segmentation scheme is widely adopted, which can correctly identify Chinese sentences as distinct words from Chinese language texts in real-word applications. However, the word identification ability of the lexicon-based scheme is highly dependent with a well prepared lexicon with… CONTINUE READING

Similar Papers

Loading similar papers…