A de-identifier for medical discharge summaries
@article{Uzuner2008ADF, title={A de-identifier for medical discharge summaries}, author={{\"O}zlem Uzuner and Tawanda C. Sibanda and Yuan Luo and Peter Szolovits}, journal={Artificial intelligence in medicine}, year={2008}, volume={42 1}, pages={ 13-35 } }
Tables from this paper
99 Citations
Automatic de-identification of textual documents in the electronic health record: a review of recent research
- Computer ScienceBMC medical research methodology
- 2010
A review of recent research in automated de-identification of narrative text documents from the electronic health record finds methods based on dictionaries performed better with PHI that is rarely mentioned in clinical text, but are more difficult to generalize.
Improved de-identification of physician notes through integrative modeling of both public and private medical text
- MedicineBMC Medical Informatics and Decision Making
- 2013
The results indicate that distributional differences between private and public medical text can be used to accurately classify PHI, and train a model to recognize non-PHI words and phrases that appear in public medical texts.
De-identification of clinical narratives through writing complexity measures
- Computer ScienceInt. J. Medical Informatics
- 2014
De-identification of patient notes with recurrent neural networks
- Computer Science, MedicineJ. Am. Medical Informatics Assoc.
- 2017
The first de-identification system based on artificial neural networks (ANNs), which requires no handcrafted features or rules, unlike existing systems, is introduced, which outperforms the state-of-the-art systems.
Evaluating current automatic de-identification methods with Veteran’s health administration clinical documents
- MedicineBMC Medical Research Methodology
- 2012
Evaluation of existing automated text de-identification methods and tools, as applied to Veterans Health Administration (VHA) clinical documents, to assess which methods perform better with each category of PHI found in clinical notes; and when new methods are needed to improve performance.
Named Entity Recognition in Unstructured Medical Text Documents
- Medicine2021 International Conference on Electrical, Computer and Energy Technologies (ICECET)
- 2021
The NER toolkits of OpenNLP and spaCy are applied to identify and subsequently remove/encode PII information from IME reports prepared by the physician and it is found that both platforms achieve high performance at de-identification and that a spaCy model trained with a 70–30 train-test data split is most performant.
A de-identifier for electronic medical records based on a heterogeneous feature set
- Computer Science
- 2011
This thesis describes an extended and specialized Named Entity Recognizer (NER) to detect instances of Protected Health Information in electronic medical records (A de-identifier) and shows that the benefit from having an inclusive set of features outweighs the harm from the very large dimensionality of the resulting classification problem.
Combining knowledge- and data-driven methods for de-identification of clinical narratives
- Computer ScienceJ. Biomed. Informatics
- 2015
DEDUCE: A pattern matching method for automatic de-identification of Dutch medical text
- MedicineTelematics Informatics
- 2018
Rule-based information extraction from patients' clinical data
- Computer ScienceJ. Biomed. Informatics
- 2009
References
SHOWING 1-10 OF 60 REFERENCES
Identification of patient name references within medical documents using semantic selectional restrictions
- Computer ScienceAMIA
- 2002
The proposed algorithm is based on estimating the fitness of candidate patient name references to a set of semantic selectional restrictions that place tight contextual requirements upon candidate words in the report text and are determined automatically from a manually tagged corpus of training reports.
Viewpoint Paper: Evaluating the State-of-the-Art in Automatic De-identification
- Computer ScienceJ. Am. Medical Informatics Assoc.
- 2007
An overview of this de-identification challenge is provided, the data and the annotation process are described, the evaluation metrics are explained, the nature of the systems that addressed the challenge are discussed, the results of received system runs are analyzed, and directions for future research are identified.
Was the Patient Cured? Understanding Semantic Categories and Their Relationships in Patient Records
- Computer Science
- 2006
CaRE combines the solutions to de-identification, semantic category recognition, assertion classification, and semantic relationship classification into a single application that facilitates the easy extraction of semantic information from medical text.
Computer-Assisted De-Identification of Free-text Nursing Notes
- Medicine
- 2005
A semi-automated method was developed to allow clinicians to highlight PHI on the screen of a tablet PC and to compare and combine the selections of different experts reading the same notes, and expert adjudication demonstrated that inter-human variability was high.
Evaluation of a deidentification (De-Id) software engine to share pathology reports and clinical documents for research.
- MedicineAmerican journal of clinical pathology
- 2004
By the end of the evaluation, the system was reliably and specifically removing safe-harbor identifiers and producing highly readable deidentified text without removing important clinical information.
Concept-match medical data scrubbing. How pathology text can be used in research.
- MedicineArchives of pathology & laboratory medicine
- 2003
Computerized scrubbing can render the textual portion of a pathology report harmless for research purposes, and this article addresses the problem of data scrubbing.
Development and evaluation of an open source software tool for deidentification of pathology reports
- MedicineBMC Medical Informatics Decis. Mak.
- 2006
There was variation in performance among reports from the three institutions, highlighting the need for site-specific customization, which is easily accomplished with the open source, HIPAA compliant, deidentification tool.
The Unified Medical Language System.
- Computer ScienceYearbook of medical informatics
- 1993
The UMLS project and current developments in high-speed, high-capacity international networks are converging in ways that have great potential for enhancing access to biomedical information.
Replacing personally-identifying information in medical records, the Scrub system.
- Computer ScienceProceedings : a conference of the American Medical Informatics Association. AMIA Fall Symposium
- 1996
We define a new approach to locating and replacing personally-identifying information in medical records that extends beyond straight search-and-replace procedures, and we provide techniques for…
Research Paper: Fast Exact String Pattern-matching Algorithms Adapted to the Characteristics of the Medical Language
- Computer ScienceJ. Am. Medical Informatics Assoc.
- 2000
The time performance of exact string pattern matching can be greatly improved if an efficient algorithm is used, and considering the growing amount of text handled in the electronic patient record, it is worth implementing this efficient algorithm.