Guergana K. Savova

Learn More
We aim to build and evaluate an open-source natural language processing system for information extraction from electronic medical record clinical free-text. We describe and evaluate our system, the clinical Text Analysis and Knowledge Extraction System (cTAKES), released open-source at The cTAKES builds on existing open-source(More)
OBJECTIVES We examine recent published research on the extraction of information from textual documents in the Electronic Health Record (EHR). METHODS Literature review of the research published after 1995, based on PubMed, conference proceedings, and the ACM Digital Library, as well as on relevant publications referenced in papers already included. (More)
Discharge summaries and other free-text reports in healthcare transfer information between working shifts and geographic locations. Patients are likely to have di culties in understanding their content, because of their medical jargon, non-standard abbreviations, and ward-speci c idioms. This paper reports on an evaluation lab with an aim to support the(More)
This issue of JAMIA focuses on natural language processing (NLP) techniques for clinical-text information extraction. Several articles are offshoots of the yearly ‘Informatics for Integrating Biology and the Bedside’ (i2b2) ( NLP shared-task challenge, introduced by Uzuner et al (see page 552) and cosponsored by the Veteran’s(More)
This paper describes the SemEval-2014, Task 7 on the Analysis of Clinical Text and presents the evaluation results. It focused on two subtasks: (i) identification (Task A) and (ii) normalization (Task B) of diseases and disorders in clinical reports as annotated in the Shared Annotated Resources (ShARe)1 corpus. This task was a follow-up to the ShARe/CLEF(More)
Clinical TempEval 2015 brought the temporal information extraction tasks of past TempEval campaigns to the clinical domain. Nine sub-tasks were included, covering problems in time expression identification, event expression identification and temporal relation identification. Participant systems were trained and evaluated on a corpus of clinical notes and(More)
This article discusses the requirements of a formal specification for the annotation of temporal information in clinical narratives. We discuss the implementation and extension of ISO-TimeML for annotating a corpus of clinical notes, known as the THYME corpus. To reflect the information task and the heavily inference-based reasoning demands in the domain, a(More)
We introduce an extensible and modifiable knowledge representation model to represent cancer disease characteristics in a comparable and consistent fashion. We describe a system, MedTAS/P which automatically instantiates the knowledge representation model from free-text pathology reports. MedTAS/P is based on an open-source framework and its components use(More)
OBJECTIVE To create annotated clinical narratives with layers of syntactic and semantic labels to facilitate advances in clinical natural language processing (NLP). To develop NLP algorithms and open source components. METHODS Manual annotation of a clinical narrative corpus of 127 606 tokens following the Treebank schema for syntactic information,(More)
We present a comparative study between two machine learning methods, Conditional Random Fields and Support Vector Machines for clinical named entity recognition. We explore their applicability to clinical domain. Evaluation against a set of gold standard named entities shows that CRFs outperform SVMs. The best F-score with CRFs is 0.86 and for the SVMs is(More)