Disease-Disease Relationships for Rheumatic Diseases: Web-Based Biomedical Textmining an Knowledge Discovery to Assist Medical Decision Making


The MEDLINE database (Medical Literature Analysis and Retrieval System Online) contains an enormously increasing volume of biomedical articles. There is urgent need for techniques which enable the discovery, the extraction, the integration and the use of hidden knowledge in those articles. Text mining aims at developing technologies to help cope with the interpretation of these large volumes of publications. Co-occurrence analysis is a technique applied in text mining and the methodologies and statistical models are used to evaluate the significance of the relationship between entities such as disease names, drug names, and keywords in titles, abstracts or even entire publications. In this paper we present a method and an evaluation on knowledge discovery of disease-disease relationships for rheumatic diseases. This has huge medical relevance, since rheumatic diseases affect hundreds of millions of people worldwide and lead to substantial loss of functioning and mobility. In this study, we interviewed medical experts and searched the ACR (American College of Rheumatology) web site in order to select the most observed rheumatic diseases to explore disease-disease relationships. We used a web based text-mining tool to find disease names and their co-occurrence frequencies in MEDLINE articles for each disease. After finding disease names and frequencies, we normalized the names by interviewing medical experts and by utilizing biomedical resources. Frequencies are normally a good indicator of the relevance of a concept but they tend to overestimate the importance of common concepts. We also used Pointwise Mutual Information (PMI) measure to discover the strength of a relationship. PMI provides an indication of how more often the query and concept co-occur than expected by change. After finding PMI values for each disease, we ranked these values and frequencies together. The results reveal hidden knowledge in articles regarding rheumatic diseases indexed by MEDLINE, thereby exposing relationships that can provide important additional information for medical experts and researchers for medical decision-making.

DOI: 10.1109/COMPSAC.2012.77

19 Figures and Tables

Cite this paper

@article{Holzinger2012DiseaseDiseaseRF, title={Disease-Disease Relationships for Rheumatic Diseases: Web-Based Biomedical Textmining an Knowledge Discovery to Assist Medical Decision Making}, author={Andreas Holzinger and Klaus-Martin Simonic and Pinar Yildirim}, journal={2012 IEEE 36th Annual Computer Software and Applications Conference}, year={2012}, pages={573-580} }