• Corpus ID: 16408858

Was the Patient Cured? Understanding Semantic Categories and Their Relationships in Patient Records

  title={Was the Patient Cured? Understanding Semantic Categories and Their Relationships in Patient Records},
  author={Tawanda C. Sibanda},
In this thesis, we detail an approach to extracting key information in medical discharge summaries. Starting with a narrative patient report, we first identify and remove information that compromises privacy (de-identification); next we recognize words and phrases in the text belonging to semantic categories of interest to doctors (semantic category recognition). For disease and symptoms, we determine whether the problem is present, absent, uncertain, or associated with somebody else (assertion… 
Biomarker information extraction tool (BIET) development using natural language processing and machine learning
BIET combines the solutions to semantic category recognition, assertion classification and semantic relationship classification into a single application that facilitates the easy extraction of semantic information from medical text.
Extracting Biomarker Information Applying Natural Language Processing and Machine Learning
The system, Biomarker Information Extraction Tool (BIET) implements Machine Learning-based biomarker extraction using support vector machines (SVM) and is trained and tested on a corpus of oncology related PubMed/MEDLINE literatures hand-annotated with biomarker information.
Shallow Features for Differentiating Disease-Treatment Relations Using Supervised Learning A Pilot Study
The problem is designed as a supervised machine learning task in which the relations are tried to be learned using pre-annotated data and the challenges designing the problem and empirical results are presented.
A de-identifier for medical discharge summaries
Feature engineering combined with machine learning and rule-based methods for structured information extraction from narrative clinical discharge summaries
In concept extraction, it is demonstrated that switching models, one of which is especially designed for telegraphic sentences, improved extraction of the treatment concept significantly and a set of features derived from a rule-based classifier were proven to be effective for the classes such as conditional and possible.
A de-identifier for electronic medical records based on a heterogeneous feature set
This thesis describes an extended and specialized Named Entity Recognizer (NER) to detect instances of Protected Health Information in electronic medical records (A de-identifier) and shows that the benefit from having an inclusive set of features outweighs the harm from the very large dimensionality of the resulting classification problem.
Personalized medicine through automatic extraction of information from medical texts
This thesis’s goal is to prove that natural language processing and machine learning techniques represent reliable solutions for solving important medical-related problems.
Extraction of Disease Relationship from Medical Records : Vector Based Approach
A method that extract semantics from medical discharge summaries using vector based approach to identify the semantic relationship between diseases and enlist the list of possible diseases that the patient may encounter is proposed.
Semi-Supervised Learning to Identify UMLS Semantic Relations
  • Yuan Luo, Ozlem Uzuner
  • Computer Science
    AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science
  • 2014
This work proposes and implements a semi-supervised approach for automatically identifying UMLS semantic relations from narrative text in PubMed, and analyzes biomedical narrative text to collect semantic entity pairs, and extracts multiple semantic, syntactic and orthographic features for the collected pairs.
It is hypothesized that problem list generation can be approached as a two-step classification problem - problem mention status and patient problem status (Aim Two) classification, which will automatically classify the status of each problem mention using semantic features about problems described in the clinical narrative.


Identification of patient name references within medical documents using semantic selectional restrictions
The proposed algorithm is based on estimating the fitness of candidate patient name references to a set of semantic selectional restrictions that place tight contextual requirements upon candidate words in the report text and are determined automatically from a manually tagged corpus of training reports.
A Simple Algorithm for Identifying Negated Findings and Diseases in Discharge Summaries
It is concluded that with little implementation effort a simple regular expression algorithm for determining whether a finding or disease mentioned within narrative medical reports is present or absent can identify a large portion of the pertinent negatives from discharge summaries.
A controlled trial of automated classification of negation from clinical notes
Automated assignment of negation to concepts identified in health records based on review of the text is feasible and practical and Lexical assignment of Negation is a good test of true Negativity as judged by the high sensitivity, specificity and positive likelihood ratio of the test.
Indexing UMLS Semantic Types for Medical Question-Answering
It is shown, using statistical studies, that strategies for using these new tags in a QA context are to take in account the individual origin of documents.
Research Paper: A General Natural-language Text Processor for Clinical Radiology
Development of a general natural-language processor that identifies clinical information in narrative reports and maps that information into a structured representation containing clinical terms, using radiology as the test domain.
Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program
The UMLS Metathesaurus, the largest thesaurus in the biomedical domain, provides a representation of biomedical knowledge consisting of concepts classified by semantic type and both hierarchical and
Extracting Diagnoses from Discharge Summaries
A program for extracting the diagnoses and procedures from the past medical history and discharge diagnoses in the discharge summary of a case and coding these using SNOMED-CT in the UMLS using a limited amount of natural language processing.
Classifying Semantic Relations in Bioscience Texts
This work examines the problem of distinguishing among seven relation types that can occur between the entities "treatment" and "disease" in bioscience text, and finds that the latter help achieve high classification accuracy.
Probabilistic Reasoning for Entity & Relation Recognition
This paper develops a method for recognizing relations and entities in sentences, while taking mutual dependencies among them into account, and preliminary experimental results are promising and show that the global inference approach improves over learning Relations and entities separately.
Automation of a problem list using natural language processing
The global aim of this project is to automate the process of creating and maintaining a problem list for hospitalized patients and thereby help to guarantee the timeliness, accuracy and completeness of this information.