The labeled corpus construction of Chinese subjectivity texts[J
- Hongyan Song, Jun liu, Tianfang Yao, Quansheng Liu, Gaohui Huang
- Journal of Chinese Information Processing,
This paper proposes a grammar-based unsupervised method to automatically mine the Chinese volitive words, which are the important clues of intention and desiration in literal content, such as “can”, “must”, “rather than”, etc. Besides, the paper introduces a scheme of manually tagging volitive words from large-scale Chinese blogs. And the tagged blogs are adopted as corpus to evaluate our unsupervised method in experiments. The results show a precision of 74.25% and a recall of 76.03%. Based on the above method, the paper constructs a statistical model to acquire all the volitive words with the trend of the mining, which improves the performance further.