Verayuth Lertnattee

Learn More
Most of traditional text categorization approaches utilize term frequency (tf) and inverse document frequency (idf) for representing importance of words and/or terms in classifying a text document. This paper describes an approach to apply term distributions, in addition to tf and idf, to improve performance of centroid-based text categorization. Three(More)
Herbal information is a special type of information dealing with medicinal herbs. Several problems may be occurred when we use general-purpose search engines as a tool for finding herbal information on Internet. Firstly, they often give too many results and some results unrelated to the query. Secondly, several documents should be retrieved but not retrieve(More)
Most previous works on text categorization applied term occurrence frequency and inverse document frequency for representing importance of terms. This work presents an analysis of inverse class frequency in centroid-based text categorization. There are two aims of this paper. The first one is to find appropriate functions of inverse class frequency. The(More)
Centroid-based text classification is one of the most popular supervised approaches to classify texts into a set of pre-defined classes. Based on the vector-space model, the performance of this classification particularly depends on the way to weight and select important terms in documents for constructing a prototype class vector for each class. In the(More)
Automatic text classification for Web collection is a non- trivial task. Since Thai academic Web pages usually present technical articles. They may have many technical terms both in Thai and English. This paper presents two approaches towards the problem of a large number of unique terms in a Web page: 1) term weighting schemes and 2) schemes using Web link(More)
This paper proposes a multidimensional model for classifying drug information text documents. The concept of multidimensional category model is introduced for representing classes. In contrast with traditional flat and hierarchical category models, the multidimensional category model classifies each document using multiple predefined sets of categories,(More)