HokuMed in NTCIR-11 MedNLP-2: Automatic Extraction of Medical Complaints from Japanese Health Records Using Machine Learning and Rule-based Methods

Abstract

A conditional random fields model was trained to detect medical complaints in Japanese health record text. Tokenisation was applied by using the dependency parser CaboCha and the conditional random fields model was trained on tokens in a window size of two preceding and three following tokens, as well as on part-of-speech, vocabulary mapping, header name, frequent suffix, orthography and presence of a modality cue. Modality detection relied on dictionaries of cues for negation, suspicion and family. The scope of negation and suspicion cues was determined by rules relying on the output of CaboCha. For negation and family, cues were gathered by scanning the development corpus for cues, while suspicion cues were obtained by translating English cues. The best result achieved for recognizing complaints was a precision of 87% and a recall of 77%. For modality detection, positive was detected with a precision of 87% and a recall of 77%, negation with a precision of 76% and a recall of 69%, suspicion with a precision 49% and a recall of 51%, and family with a precision of 78% and a recall of 81%.

Extracted Key Phrases

3 Figures and Tables

Cite this paper

@inproceedings{Ahltorp2014HokuMedIN, title={HokuMed in NTCIR-11 MedNLP-2: Automatic Extraction of Medical Complaints from Japanese Health Records Using Machine Learning and Rule-based Methods}, author={Magnus Ahltorp and Hideyuki Tanushi and Shiho Kitajima and Maria Skeppstedt and Rafał Rzepka and Kenji Araki}, booktitle={NTCIR}, year={2014} }