Conceptualization Effects on MEDLINE Documents Classification Using Rocchio Method

Abstract

The aim of this paper is to propose a supervised text classification method for the biomedical domain using semantic resources. We choose the traditional text classification method, Rocchio, for its scalability and extendibility with semantic knowledge. This paper proposes to integrate semantic aspects into Rocchio through a conceptualization task. This conceptualization is realized by mapping terms that are extracted from text to their corresponding concepts in the UMLS® Metathesaurus® in order to take meaning into consideration during text classification. The proposed classifier is tested on the Ohsumed text corpus, which is composed of abstracts of biomedical articles retrieved from the MEDLINE® database. The effects of Conceptualization on Rocchio's performance are discussed according to different standard similarity measures and to a variety of conceptualization strategies.

DOI: 10.1109/WI-IAT.2012.210

Extracted Key Phrases

2 Figures and Tables

Cite this paper

@article{Albitar2012ConceptualizationEO, title={Conceptualization Effects on MEDLINE Documents Classification Using Rocchio Method}, author={Shereen Albitar and S{\'e}bastien Fournier and Bernard Espinasse}, journal={2012 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology}, year={2012}, volume={1}, pages={462-466} }