Developing a standard for de-identifying electronic patient records written in Swedish: Precision, recall and F-measure in a manual and computerized annotation trial

@article{Velupillai2009DevelopingAS,
  title={Developing a standard for de-identifying electronic patient records written in Swedish: Precision, recall and F-measure in a manual and computerized annotation trial},
  author={Sumithra Velupillai and Hercules Dalianis and Martin Hassel and Gunnar H. Nilsson},
  journal={International journal of medical informatics},
  year={2009},
  volume={78 12},
  pages={
          e19-26
        }
}

Tables from this paper

De-identifying Swedish clinical text - refinement of a gold standard and experiments with Conditional random fields
TLDR
This work presents work on the creation of two refined variants of a manually annotated Gold standard for de-identification, one created automatically, and one created through discussions among the annotators, both of which are based on the Stockholm EPR Corpus.
Creating and Evaluating a Synthetic Norwegian Clinical Corpus for De-Identification
TLDR
An already existing Norwegian synthetic clinical corpus, NorSynthClinical, has been extended with PHIs and annotated by two annotators, obtaining an inter-annotator agreement of 0.94 F1-measure.
ResearchDe-identifying Swedish clinical text-refinement of a gold standard and experiments with Conditional random fields
TLDR
This work presents work on the creation of two refined variants of a manually annotated Gold standard for deidentification of Swedish EPRs, one created automatically, and one created through discussions among the annotators, both of which are based on the Conditional Random Fields algorithm.
A Semi-supervised Approach for De-identification of Swedish Clinical Text
TLDR
A semi-supervised method is proposed, for automatically creating high-quality training data, and shows that the method can be used to improve recall from 84.75% to 89.20% without sacrificing precision to the same extent, dropping from 95.73% to 94.20%.
Is the Juice Worth the Squeeze? Costs and Benefits of Multiple Human Annotators for Clinical Text De-identification.
TLDR
Incorporating a second annotator into a PII annotation process reduces unredacted PII and improves the quality of annotations to 0.99 recall, yielding clear benefit at reasonable cost; the cost advantages of annotation teams larger than two diminish rapidly.
A De-identification Method for Bilingual Clinical Texts of Various Note Types
TLDR
This study proposes a regular expression-based de-identification method used to address bilingual clinical records written in Korean and English and successfully removed the identifiers in diverse types of bilingual clinical narrative texts.
De-identification of primary care electronic medical records free-text data in Ontario, Canada
TLDR
The deid program can be modified to reasonably accurately de-identify free-text primary care EMR records while preserving clinical content.
Influence of Module Order on Rule-Based De-identification of Personal Names in Electronic Patient Records Written in Swedish
TLDR
Four common rules for de-identification of personal names in EPRs written in Swedish are implemented and evaluated and it is shown that to obtain the highest recall and precision, the rules should be applied in the following order.
...
...

References

SHOWING 1-10 OF 18 REFERENCES
Automated de-identification of free-text medical records
TLDR
An automated Perl-based de-identification software package that is generally usable on most free-text medical records, e.g., nursing notes, discharge summaries, X-ray reports, etc, and is sufficiently generalized and can be customized to handle text files of any format is described.
Diagnosing Diagnoses in Swedish Clinical Records
TLDR
This project has access to a large set of clinical records from several departments in one of the largest hospitals in Sweden, providing an invaluable data set for many research areas and plans to apply and evaluate existing state-of-the-art methods on Swedish clinical records.
A de-identifier for medical discharge summaries
Viewpoint Paper: Evaluating the State-of-the-Art in Automatic De-identification
TLDR
An overview of this de-identification challenge is provided, the data and the annotation process are described, the evaluation metrics are explained, the nature of the systems that addressed the challenge are discussed, the results of received system runs are analyzed, and directions for future research are identified.
Role of Local Context in Automatic Deidentification of Ungrammatical, Fragmented Text
TLDR
It is shown that one can deidentify medical discharge summaries using support vector machines that rely on a statistical representation of local context, which contributes more to deidentification than dictionaries and hand-tailed heuristics.
New directions in biomedical text annotation: definitions, guidelines and corpus construction
TLDR
The results of the inquiry into properties of scientific text that have sufficient generality to transcend the confines of a narrow subject area are reported, while supporting practical mining of text for factual information.
State-of-the-art anonymization of medical records using an iterative machine learning framework.
TLDR
A de-identification model that can successfully remove personal health information (PHI) from discharge records to make them conform to the guidelines of the Health Information Portability and Accountability Act is developed.
Bootstrapping Named Entity Annotation by Means of Active Machine Learning: A Method for Creating Corpora
TLDR
The results suggest that while the recognizer produced in phases one and two is as useful for pre-tagging as a recognizer created from randomly selected documents, the applicability of the Recognizer created during phase two as a pre-tagger in phase three is best investigated by conducting a user study involving real annotators working on a real named entity recognition task.
SweNam-A Swedish Named Entity recognizer Its construction, training and evaluation
TLDR
The development, training and evaluation of a Swedish Named Entity (NE) tagger called SweNam is described and it is shown that it is possible to obtain about 92 percent precision and 46 percent recall of the named entities of a text from rule based recognition with training.
Testing Tactics to Localize De-Identification
TLDR
A first gross de-identification step is performed in the hospital for new documents in a language different from English, here French patient reports, and two methods are tested: the first attempts to adapt an existing US de-Identifier for English, the second re-develops a new system which applies the same methods.
...
...