Learn More
In vector space model (VSM), text representation is the task of transforming the content of a textual document into a vector in the term space so that the document could be recognized and classified by a computer or a classifier. Different terms (i.e. words, phrases, or any other indexing units used to identify the contents of a text) have different(More)
Web 2.0 technology has enabled more and more people to freely express their opinions on the Web, making the Web an extremely valuable source for mining user opinions about all kinds of topics. In this paper we study how to automatically integrate opinions expressed in a well-written expert review with lots of opinions scattering in various sources such as(More)
In this paper, we define and study a new opinionated text data analysis problem called Latent Aspect Rating Analysis (LARA), which aims at analyzing opinions expressed about an entity in an online review at the level of topical aspects to discover each individual reviewer's latent opinion on each aspect as well as the relative emphasis on different aspects(More)
The explosion of Web opinion data has made essential the need for automatic tools to analyze and understand people's sentiments toward different topics. In most sentiment analysis applications, the sentiment lexicon plays a central role. However, it is well known that there is no universally optimal sentiment lexicon since the polarity of words is sensitive(More)
Probabilistic topic models have recently attracted much attention because of their successful applications in many text mining tasks such as retrieval, summarization, categorization, and clustering. Although many existing studies have reported promising performance of these topic models, none of the work has systematically investigated the task performance(More)
Mining detailed opinions buried in the vast amount of review text data is an important, yet quite challenging task with widespread applications in multiple domains. <i>Latent Aspect Rating Analysis</i> (LARA) refers to the task of inferring both opinion ratings on topical aspects (e.g., location, service of a hotel) and the relative weights reviewers have(More)
A nearest-neighbor chain (NNC) based approach is proposed in this paper to develop a skew estimation method with a high accuracy and with language-independent capability. Size restriction is introduced to the detection of nearest-neighbors (NN). Then NNCs are extracted from the adjacent NN pairs, in which the slopes of the NNCs with a largest possible(More)
—With the rising popularity and importance of document images as an information source, information retrieval in document image databases has become a growing and challenging problem. In this paper, we propose an approach with the capability of matching partial word images to address two issues in document image retrieval: word spotting and similarity(More)
An approach to word image matching based on weighted Hausdorff distance(WHD) is proposed in this paper to facilitate the detection and location of the user-specified words in the document images. Preprocessing such as eliminating the space between adjacent characters in the word images and scale normalization is first done before the WHD is utilized to(More)