Thanaruk Theeramunkong

Learn More
Mining generalized frequent patterns of generalized association rules is an important process in knowledge discovery system. In this paper, we propose a new approach for efficiently mining all frequent patterns using a novel set enumeration algorithm with two types of constraints on two generalized itemset relationships, called subset-superset and(More)
Generalized association rule mining is an extension of traditional association rule mining to discover more informative rules, given a taxonomy. In this paper, we describe a formal framework for the problem of mining generalized association rules. In the framework, The subset-superset and the parent-child relationships among generalized itemsets are(More)
Most of traditional text categorization approaches utilize term frequency (tf) and inverse document frequency (idf) for representing importance of words and/or terms in classifying a text document. This paper describes an approach to apply term distributions, in addition to tf and idf, to improve performance of centroid-based text categorization. Three(More)
Predicting protein subcellular location is one of major challenges in Bioinformatics area since such knowledge helps us understand protein functions and enables us to select the targeted proteins during drug discovery process. While many computational techniques have been proposed to improve predictive performance for protein subcellular location, they have(More)
For languages without word boundary delimiters, dictionaries are needed for segmenting running texts. This figure makes segmentation accuracy depend significantly on the quality of the dictionary used for analysis. If the dictionary is not sufficiently good, it will lead to a great number of unknown or unrecognized words. These unrecognized words certainly(More)
Most previous works on text categorization applied term occurrence frequency and inverse document frequency for representing importance of terms. This work presents an analysis of inverse class frequency in centroid-based text categorization. There are two aims of this paper. The first one is to find appropriate functions of inverse class frequency. The(More)
Some languages including Thai, Japanese and Chinese do not have explicit word boundary. This causes the problem of word boundary ambiguity that results in decreasing the accuracy of information retrieval. This paper proposes a new technique so-called character clustering to reduce the ambiguity of word boundary in Thai documents and hence improve searching(More)