Learn More
Automatic text classification is one of the most important tools in Information Retrieval. This paper presents a novel text classifier using positive and unlabeled examples. The primary challenge of this problem as compared with the classical text classification problem is that no labeled negative documents are available in the training example set.(More)
Sentiment classification aims at mining reviews of people for a certain event's topic or product by automatic classifying the reviews into positive or negative opinions. With the fast developing of World Wide Web applications, sentiment classification would have huge opportunity to help people automatic analysis of customers' opinions from the web(More)
This paper presents a new edge-counting based method using Word Net to compute the similarity. The method achieves a similarity that perfectly fits with human rating and effectively simulate the human tHought process that is people prefer to consider more differences when the semantic distance between two word is closer, and vice versa. At last, we weigh up(More)
Automatic text classification is one of the most important tools in Information Retrieval. As the traditional methods for text classification cannot find the best feature set, the GA is applied to the feature selection because it can get the global optimal solution. This paper presents a novel text classifier from positive and unlabeled documents based on(More)
In deep web, a significant amount of information can only be accessed through query interface of a back-end database, however, general search engine can not interact with the query interface, resulting in the myriad hidden and unvisible information can not be accessed. Therefore, this paper proposes a novel method of filling forms of deep web entries by(More)
For integrating web databases, the very first challenge is to understand what a query interface says or what query capabilities a source supports. From the view of people, the interior structure of web pages is not concerned to for people. In the most cases, semantic block is identified via visual elements. Therefore, in this paper, a novel arithmetic of(More)
Topical Web crawling is an established technique for domain-specific information retrieval. However, almost all the conventional topical Web crawlers focus on building crawlers using different classifiers, which needs a lot of labeled training data that is very difficult to labelmanually. This paper presents a novel approach called clustering-based topical(More)