Learn More
We aim to build and evaluate an open-source natural language processing system for information extraction from electronic medical record clinical free-text. We describe and evaluate our system, the clinical Text Analysis and Knowledge Extraction System (cTAKES), released open-source at http://www.ohnlp.org. The cTAKES builds on existing open-source(More)
Information in electronic medical records is often in an unstructured free-text format. This format presents challenges for expedient data retrieval and may fail to convey important findings. Natural language processing (NLP) is an emerging technique for rapid and efficient clinical data retrieval. While proven in disease detection, the utility of NLP in(More)
OBJECTIVE The long-term goal of this work is the automated discovery of anaphoric relations from the clinical narrative. The creation of a gold standard set from a cross-institutional corpus of clinical notes and high-level characteristics of that gold standard are described. METHODS A standard methodology for annotation guideline development, gold(More)
Coreference resolution is the task of determining linguistic expressions that refer to the same real-world entity in natural language. Research on coreference resolution in the general English domain dates back to 1960s and 1970s. However, research on coreference resolution in the clinical free text has not seen major development. The recent US government(More)
Although joint inference is an effective approach to avoid cascading of errors when inferring multiple natural language tasks, its application to information extraction has been limited to modeling only two tasks at a time, leading to modest improvements. In this paper, we focus on the three crucial tasks of automated extraction pipelines: entity tagging,(More)
MOTIVATION Expressions that refer to a real-world entity already mentioned in a narrative are often considered anaphoric. For example, in the sentence "The pain comes and goes," the expression "the pain" is probably referring to a previous mention of pain. Interpretation of meaning involves resolving the anaphoric reference: deciding which expression in the(More)
Coreference resolution systems can benefit greatly from inclusion of global context, and a number of recent approaches have demonstrated improvements when precom-puting an alignment to external knowledge sources. However, since alignment itself is a challenging task and is often noisy, existing systems either align conservatively, resulting in very few(More)
OBJECTIVE To research computational methods for coreference resolution in the clinical narrative and build a system implementing the best methods. METHODS The Ontology Development and Information Extraction corpus annotated for coreference relations consists of 7214 coreferential markables, forming 5992 pairs and 1304 chains. We trained classifiers with(More)
The Worcester Heart Attack Study (WHAS) is a population-based surveillance project examining trends in the incidence, in-hospital, and long-term survival rates of acute myocardial infarction (AMI) among residents of central Massachusetts. It provides insights into various aspects of AMI. Much of the data has been assessed manually. We are developing(More)
It has been shown that accessing the patients’ own electronic health records (EHR) can enhance their medical understanding and provide clinically relevant benefits. However, languages that are difficult for non-medical professionals to comprehend are prevalent in the EHR notes. The valuable and authoritative information contained in the EHR is thus less(More)