Applying Machine Learning to Text Segmentation for Information Retrieval


We propose a self-supervised word segmentation technique for text segmentation in Chinese information retrieval. This method combines the advantages of traditional dictionary based, character based and mutual information based approaches, while overcoming many of their shortcomings. Experiments on TREC data show this method is promising. Our method is… (More)
DOI: 10.1023/A:1026028229881

20 Figures and Tables


