Learn More
While naive Bayes is quite effective in various data mining tasks, it shows a disappointing result in the automatic text classification problem. Based on the observation of naive Bayes for the natural language text, we found a serious problem in the parameter estimation process, which causes poor results in text classification domain. In this paper, we(More)
Subject or prepositional content has been the focus of most classification research. Genre or style, on the other hand, is a different and important property of text, and automatic text genre classification is becoming important for classification and retrieval purposes as well as for some natural language processing research. In this paper, we present a(More)
This paper describes a sense disambiguation method for a polysemous target noun using the context words surrounding the target noun and its WordNet relatives, such as synonyms, hypernyms and hyp-onyms. The result of sense disambiguation is a relative that can substitute for that target noun in a context. The selection is made based on co-occurrence(More)
The purpose of this paper is to overview research efforts at the NTCIR-5 CLIR task, which is a project of large-scale retrieval experiments on cross-lingual information retrieval (CLIR) of Chinese, Japanese, Korean, and English. The project has three sub-tasks, multilingual IR (MLIR), bilingual IR (BLIR), and single language IR (SLIR), in which many(More)
The purpose of this paper is to overview research efforts at the NTCIR-4 CLIR task, which is a project of large-scale retrieval experiments on cross-lingual information retrieval (CLIR) of Chinese, Japanese, Korean, and English. The project has four sub-tasks, multilingual IR (MLIR), bilingual IR (BLIR), pivot bilingual IR (PLIR) and single language IR(More)
As WWW grows at an increasing speed, a classifier targeted at hypertext has become in high demand. While document categorization is quite a mature, the issue of utilizing hypertext structure and hyperlinks has been relatively unexplored. In this paper, we propose a practical method for enhancing both the speed and the quality of hypertext categorization(More)
Genre characterizes text differently than the usual subject or prepositional content that has been the focus of most information retrieval and classification research. We developed a new method for automatic genre classification that is based on statistically selected features obtained from both subject-classified and genre-classified training data. The(More)