Sung-Hyon Myaeng

Learn More
While naive Bayes is quite effective in various data mining tasks, it shows a disappointing result in the automatic text classification problem. Based on the observation of naive Bayes for the natural language text, we found a serious problem in the parameter estimation process, which causes poor results in text classification domain. In this paper, we(More)
With the growing interests in semantic web services and context-aware computing, the importance of ontologies, which enable us to perform context-aware reasoning, has been accepted widely. While domain-specific and general-purpose ontologies have been developed, few attempts have beenmade for a situation ontology that can be employed directly to support(More)
In traditional information retrieval (IR) systems, a document as a whole is the target for a query. With increasing interests in structured documents like SGML documents, there is a growing need to build an LR system that can retrieve parts of documents, which satisfy not only content-based but also structure-based requirements. In this paper, we describe(More)
As WWW grows at an increasing speed, a classifier targeted at hypertext has become in high demand. While document categorization is quite a mature, the issue of utilizing hypertext structure and hyperlinks has been relatively unexplored. In this paper, we propose a practical method for enhancing both the speed and the quality of hypertext categorization(More)
This report is an overview of Cross-Language Information Retrieval Task (CLIR) at the third NTCIR Workshop. There are 3 tracks in CLIR: Single Language IR (SLIR), Bilingual CLIR (BLIR), and Multilingual CLIR (MLIR). The scope, schedule, test collections, search results, relevance judgment, scoring results, and the preliminary analyses are described in the(More)
The purpose of this paper is to overview research efforts at the NTCIR-5 CLIR task, which is a project of large-scale retrieval experiments on cross-lingual information retrieval (CLIR) of Chinese, Japanese, Korean, and English. The project has three sub-tasks, multi-lingual IR (MLIR), bilingual IR (BLIR), and single language IR (SLIR), in which many(More)
An easy way of translating queries in one language to the other for cross-language information retrieval (IR) is to use a simple bilingual dictionary. Because of the generalpurpose nature of such dictionaries, however, this simple method yields a severe translation ambiguity problem. This paper describes the degree to which this problem arises in(More)
<lb>The purpose of this paper is to overview research<lb>efforts at the NTCIR-4 CLIR task, which is a project<lb>of large-scale retrieval experiments on cross-lingual<lb>information retrieval (CLIR) of Chinese, Japanese,<lb>Korean, and English. The project has four sub-tasks,<lb>multi-lingual IR (MLIR), bilingual IR (BLIR), pivot<lb>bilingual IR (PLIR) and(More)
Subject or prepositional content has been the focus of most classification research. Genre or style, on the other hand, is a different and important property of text, and automatic text genre classification is becoming important for classification and retrieval purposes as well as for some natural language processing research. In this paper, we present a(More)