Learn More
This paper presents an opinion analysis system developed by CUHK_PolyU_Tsinghua Web Information Analysis Group (WIA), namely WIA-Opinmine, for NTCIR-7 MOAT Task. Different from most existing opinion mining systems, which recognize opinionated sentences as one-step classification procedure, WIA-Opinmine adopts a multi-pass coarse-fine analysis strategy. A(More)
This paper presents the CUHK opinion analysis system, namely Opinmine, for the NTCIR-6 pilot task. Opinmine comprises of three functional modules: (1) Preprocessing and Assignment Module (PAM) performs word segmentation, part-of-speech (POS) tagging and named entity recognition on the input Chinese text. It is based on lexicalized Hidden Markov Model and(More)
Lyric-based song sentiment classification seeks to assign songs appropriate sentiment labels such as light-hearted and heavy-hearted. Four problems render vector space model (VSM)-based text classification approach ineffective: 1) Many words within song lyrics actually contribute little to sentiment; 2) Nouns and verbs used to express sentiment are(More)
This paper presents WIA-Opinmine system developed by CUHK_Tsinghua Web Information Analysis (WIA) Virtual Research Center for NTCIR-8 MOAT Task. The system is deemed special due to three facts. Firstly, the system is able to handle Simplified Chinese and Traditional Chinese at the same time. A tool is developed to convert Traditional Chinese into Simplified(More)
Two challenging issues are notable in tweet clustering. Firstly, the sparse data problem is serious since no tweet can be longer than 140 characters. Secondly, synonymy and polysemy are rather common because users intend to present a unique meaning with a great number of manners in tweets. Enlightened by the recent research which indicates Wikipedia is(More)
Manually labeling documents for training a text classifier is expensive and time-consuming. Moreover, a classifier trained on labeled documents may suffer from overfitting and adaptability problems. Dataless text classification (DLTC) has been proposed as a solution to these problems, since it does not require labeled documents. Previous research in DLTC(More)