Research Paper: Rapidly Retargetable Approaches to De-identification in Medical Records

@article{Wellner2007ResearchPR,
  title={Research Paper: Rapidly Retargetable Approaches to De-identification in Medical Records},
  author={Ben Wellner and Matt Huyck and Scott A. Mardis and John S. Aberdeen and Alexander A. Morgan and Leonid Peshkin and Alexander S. Yeh and Janet Hitzeman and Lynette Hirschman},
  journal={Journal of the American Medical Informatics Association : JAMIA},
  year={2007},
  volume={14 5},
  pages={
          564-73
        }
}
OBJECTIVE This paper describes a successful approach to de-identification that was developed to participate in a recent AMIA-sponsored challenge evaluation. METHOD Our approach focused on rapid adaptation of existing toolkits for named entity recognition using two existing toolkits, Carafe and LingPipe. RESULTS The "out of the box" Carafe system achieved a very good score (phrase F-measure of 0.9664) with only four hours of work to adapt it to the de-identification task. With further tuning… 

Tables from this paper

Viewpoint Paper: Evaluating the State-of-the-Art in Automatic De-identification
TLDR
An overview of this de-identification challenge is provided, the data and the annotation process are described, the evaluation metrics are explained, the nature of the systems that addressed the challenge are discussed, the results of received system runs are analyzed, and directions for future research are identified.
Large-scale evaluation of automated clinical note de-identification and its impact on information extraction
TLDR
NLP-based de-identification shows excellent performance that rivals the performance of human annotators and scales up to millions of documents quickly and inexpensively.
Patient Data De-Identification: A Conditional Random-Field-Based Supervised Approach
TLDR
Insight is provided into the de-identification task, its major challenges, techniques to address challenges, detailed analysis of the results and direction of future improvement, and a supervised machine learning technique for solving the problem of patient data deidentification.
Bootstrapping a de-identification system for narrative patient records: Cost-performance tradeoffs
Ensemble-based Methods to Improve De-identification of Electronic Health Record Narratives
TLDR
This work presents three different ensemble methods that combine multiple de-identification models trained from deep learning, shallow learning, and rule-based approaches, and shows that the stacked learning ensemble is more effective than other ensemble methods, producing the highest recall, the most important metric for de-Identification.
BoB, a best-of-breed automated text de-identification system for VHA clinical documents
TLDR
The authors' system successfully addressed VHA clinical document de-identification, and its hybrid stepwise design demonstrates robustness and efficiency, prioritizing patient confidentiality while leaving most clinical information intact.
A Recurrent Neural Network Architecture for De-identifying Clinical Records
TLDR
This paper proposes deep neural network based architecture for de-identification of 7 PHI categories with 25 associated subcategories and shows that the proposed system achieves significant improvement over baseline and comparable performance over state-of-art.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 16 REFERENCES
Viewpoint Paper: Evaluating the State-of-the-Art in Automatic De-identification
TLDR
An overview of this de-identification challenge is provided, the data and the annotation process are described, the evaluation metrics are explained, the nature of the systems that addressed the challenge are discussed, the results of received system runs are analyzed, and directions for future research are identified.
Role of Local Context in Automatic Deidentification of Ungrammatical, Fragmented Text
TLDR
It is shown that one can deidentify medical discharge summaries using support vector machines that rely on a statistical representation of local context, which contributes more to deidentification than dictionaries and hand-tailed heuristics.
Weakly Supervised Learning Methods for Improving the Quality of Gene Name Normalization Data
TLDR
This work explores methods for improving the quality of (noisy) Task 1B training data using variants of weakly supervised learning methods and presents positive results demonstrating that these methods result in an improvement in training data quality as measured by improved system performance over the same system using the originally labeled data.
Research Paper: Automating the Assignment of Diagnosis Codes to Patient Encounters Using Example-based and Machine Learning Techniques
TLDR
An automated coding system designed to assign codes to clinical diagnoses has been successfully implemented at the Mayo Clinic, which resulted in a reduction of staff engaged in manual coding from thirty-four coders to seven verifiers.
An Effective Two-Stage Model for Exploiting Non-Local Dependencies in Named Entity Recognition
This paper shows that a simple two-stage approach to handle non-local dependencies in Named Entity Recognition (NER) can outperform existing approaches that handle non-local dependencies, while being
Leveraging Machine Readable Dictionaries in Discriminative Sequence Models
TLDR
The utility of corpora-independent lexicons derived from machine readable dictionaries are demonstrated, and substantial error reductions are shown for the tasks of part-of-speech tagging and shallow parsing.
Identifying gene and protein mentions in text using conditional random fields
TLDR
A diverse feature set containing standard orthographic features combined with expert features in the form of gene and biological term lexicons is employed to achieve a precision of 86.4% and recall of 78.7% for tagging gene and protein mentions from text using the probabilistic sequence tagging framework of conditional random fields.
Overview of results of the MUC-6 evaluation
TLDR
The latest in a series of natural language processing system evaluations, the MUC-6 evaluation entailed Named Entity and Coreference tasks entailed Standard Generalized Markup Language (SGML) annotation of texts and were being conducted for the first time.
NER Systems that Suit User’s Preferences: Adjusting the Recall-Precision Trade-off for Entity Extraction
TLDR
This method based on "tweaking" an existing learned sequential classifier to change the recall-precision tradeoff, guided by a user-provided performance criterion, is described and proves to be both simple and effective.
...
1
2
...