Microsoft Word - camera-ready.docx


We explore methods for effectively extracting information from clinical narratives, which are captured in a public health consulting phone service called HealthLink. The currently available data consists of dialogues constructed by nurses while consulting patients on the phone. Since the data are interviews transcribed by nurses during phone conversations, they include a significant volume and variety of noise: First is explicit noise, which includes spelling errors, unfinished sentences, omission of sentence delimiters, variants of terms, etc. Second is implicit noise, which includes non-patient’s information and negation of patient’s information. To filter explicit noise, we propose our biomedical term detection/normalization method: it resolves misspelling, term variations, and arbitrary abbreviation of terms by nurses. In detecting temporal terms and other types of named entities (which show patients’ personal information such as age, and sex), we propose a bootstrapping-based pattern learning to detect all kinds of arbitrary variations of the named entities. To address implicit noise, we propose a dependency path-based filtering method. The result of our de-noising is the extraction of normalized patient information. The experimental results show that we achieve reasonable performance with our noise reduction methods.

Extracted Key Phrases

3 Figures and Tables

Cite this paper

@inproceedings{Kim2013MicrosoftW, title={Microsoft Word - camera-ready.docx}, author={Mi-Young Kim and Ying Xu and Osmar R. Zaiane and Randy Goebel and Osmar R. Za{\"{i}ane}, year={2013} }