Learn More
In this paper we begin to investigate how to <i>automatically</i> determine the subjectivity orientation of questions posted by real users in community question answering (CQA) portals. Subjective questions seek answers containing private states, such as personal opinion and experience. In contrast, objective questions request objective, verifiable(More)
An increasingly popular method for finding information online is via the Community Question Answering (CQA) portals such as Yahoo! Answers , Naver, and Baidu Knows. Searching the CQA archives, and ranking , filtering, and evaluating the submitted answers requires intelligent processing of the questions and answers posed by the users. One important task is(More)
Although promising results have been achieved in the areas of traffic-sign detection and classification, few works have provided simultaneous solutions to these two tasks for realistic real world images. We make two contributions to this problem. Firstly, we have created a large traffic-sign benchmark from 100000 Tencent Street View panoramas, going beyond(More)
k is the most important parameter in a text categorization system based on k-Nearest Neighbor algorithm (kNN).In the classification process, k nearest documents to the test one in the training set are determined firstly. Then, the predication can be made according to the category distribution among these k nearest neighbors. Generally speaking, the class(More)
Temporal information is useful in many NLP applications, such as information extraction, question answering and summarization. In this paper, we present a temporal parser for extracting and normalizing temporal expressions from Chinese texts. An integrated temporal framework is proposed, which includes basic temporal concepts and the classification of(More)
Error-Correcting Output Coding (ECOC) is a general framework for multiclass text classification with a set of binary classifiers. It can not only help a binary classifier solve multi-class classification problems, but also boost the performance of a multi-class classifier. When building each individual binary classifier in ECOC, multiple classes are(More)
NLM's Unified Medical Language System (UMLS) is a very large ontology of biomedical and health data. In order to be used effectively for knowledge processing, it needs to be customized to a specific domain. In this paper, we present techniques to automatically discover domain-specific concepts, discover relationships between these concepts, build a context(More)
This paper is a comparative study on representing units in Chinese text categorization. Several kinds of representing units, including byte 3-gram, Chinese character, Chinese word, and Chinese word with part of speech tag, were investigated. Empirical evidence shows that when the size of training data is large enough, representations of higher-level or with(More)
<i>k</i> is the most important parameter in a text categorization system based on the <i>k</i>-nearest neighbor algorithm (<i>k</i>NN). To classify a new document, the <i>k</i>-nearest documents in the training set are determined first. The prediction of categories for this document can then be made according to the category distribution among the <i>k</i>(More)