Sunghwan Sohn

Learn More
We aim to build and evaluate an open-source natural language processing system for information extraction from electronic medical record clinical free-text. We describe and evaluate our system, the clinical Text Analysis and Knowledge Extraction System (cTAKES), released open-source at http://www.ohnlp.org. The cTAKES builds on existing open-source(More)
The rapid growth of biomedical literature presents challenges for automatic text processing, and one of the challenges is abbreviation identification. The presence of unrecognized abbreviations in text hinders indexing algorithms and adversely affects information retrieval and extraction. Automatic abbreviation definition identification can help resolve(More)
Negation of clinical named entities is common in clinical documents and is a crucial factor to accurately compile patients' clinical conditions and to further support complex phenotype detection. In 2009, Mayo Clinic released the clinical Text Analysis and Knowledge Extraction System (cTAKES), which includes a negation annotator that identifies negation(More)
OBJECTIVE To extract physician-asserted drug side effects from electronic medical record clinical narratives. MATERIALS AND METHODS Pattern matching rules were manually developed through examining keywords and expression patterns of side effects to discover an individual side effect and causative drug relationship. A combination of machine learning (C4.5)(More)
In Electronic Health Records (EHRs), much of valuable information regarding patients' conditions is embedded in free text format. Natural language processing (NLP) techniques have been developed to extract clinical information from free text. One challenge faced in clinical NLP is that the meaning of clinical entities is heavily affected by modifiers such(More)
OBJECTIVES The aim of this study was to improve naïve Bayes prediction of Medical Subject Headings (MeSH) assignment to documents using optimal training sets found by an active learning inspired method. DESIGN The authors selected 20 MeSH terms whose occurrences cover a range of frequencies. For each MeSH term, they found an optimal training set, a subset(More)
BACKGROUND Temporal information detection systems have been developed by the Mayo Clinic for the 2012 i2b2 Natural Language Processing Challenge. OBJECTIVE To construct automated systems for EVENT/TIMEX3 extraction and temporal link (TLINK) identification from clinical text. MATERIALS AND METHODS The i2b2 organizers provided 190 annotated discharge(More)
OBJECTIVE We developed the Medication Extraction and Normalization (MedXN) system to extract comprehensive medication information and normalize it to the most appropriate RxNorm concept unique identifier (RxCUI) as specifically as possible. METHODS Medication descriptions in clinical notes were decomposed into medication name and attributes, which were(More)
This paper describes the sentiment classification system developed by the Mayo Clinic team for the 2011 I2B2/VA/Cincinnati Natural Language Processing (NLP) Challenge. The sentiment classification task is to assign any pertinent emotion to each sentence in suicide notes. We have implemented three systems that have been trained on suicide notes provided by(More)
Prevalence of abdominal aortic aneurysm (AAA) is increasing due to longer life expectancy and implementation of screening programs. Patient-specific longitudinal measurements of AAA are important to understand pathophysiology of disease development and modifiers of abdominal aortic size. In this paper, we applied natural language processing (NLP) techniques(More)