Bei Shi

Learn More
News reader comments found in many on-line news websites are typically massive in amount. We investigate the task of Cultural-common Topic Detection (CTD), which is aimed at discovering common discussion topics from news reader comments written in different languages. We propose a new probabilistic graphical model called MCTA which can cope with the(More)
Many existing methods on review spam detection considering text content merely utilize simple text features such as content similarity. We explore a novel idea of exploiting text generality for improving spam detection. Besides, apart from the task of review spam detection, although there have also been some works on identifying the review spam-mers (users)(More)
Human immunodeficiency virus type 1 (HIV-1) drug resistance and the latent reservoir are the two major obstacles to effectively controlling and curing HIV-1 infection. Therefore, it is critical to develop therapeutic strategies specifically targeting these two obstacles. Recently, we described a novel anti-HIV approach based on a modified human intrinsic(More)
Entity Set Expansion (ESE) aims at automatically acquiring instances of a specific target category. Unfortunately, traditional ESE methods usually have the expansion boundary problem and the semantic drift problem. To resolve the above two problems, this paper proposes a probabilistic Co-Bootstrapping method, which can accurately determine the expansion(More)
The state-of-the-art Chinese word segmentation systems have achieved high performance on well-formed long document. However, the segmentation for microblog is difficult due to the noise problem and the OOV problem. In this paper, we present a Chinese Micro-Blog Segmentation system for the CIP-SIGHAN Word Segmentation Bakeoff 2012 track. The proposed system(More)
  • 1