An Effective Approach for Web Document Classification using the Concept of Association Analysis of Data Mining

Abstract

Exponential growth of the web increased the importance of web document classification and data mining. To get the exact information, in the form of knowing what classes a web document belongs to, is expensive. Automatic classification of web document is of great use to search engines which provides this information at a low cost. In this paper, we propose an approach for classifying the web document using the frequent item word sets generated by the Frequent Pattern (FP) Growth which is an association analysis technique of data mining. These set of associated words act as feature set. The final classification obtained after Naïve Bayes classifier used on the feature set. For the experimental work, we use Gensim package, as it is simple and robust. Results show that our approach can be effectively classifying the web document. KeywordsClassification, FP-growth, Gensim, Naïve Bayes, Vector space model

Extracted Key Phrases

4 Figures and Tables

Cite this paper

@article{Roul2012AnEA, title={An Effective Approach for Web Document Classification using the Concept of Association Analysis of Data Mining}, author={Rajendra Kumar Roul and Sanjay Kumar Sahay}, journal={CoRR}, year={2012}, volume={abs/1406.5616} }