De-identifying Swedish clinical text - refinement of a gold standard and experiments with Conditional random fields
@article{Dalianis2010DeidentifyingSC, title={De-identifying Swedish clinical text - refinement of a gold standard and experiments with Conditional random fields}, author={Hercules Dalianis and Sumithra Velupillai}, journal={Journal of Biomedical Semantics}, year={2010}, volume={1}, pages={6 - 6} }
BackgroundIn order to perform research on the information contained in Electronic Patient Records (EPRs), access to the data itself is needed. This is often very difficult due to confidentiality regulations. The data sets need to be fully de-identified before they can be distributed to researchers. De-identification is a difficult task where the definitions of annotation classes are not self-evident.ResultsWe present work on the creation of two refined variants of a manually annotated Gold…
43 Citations
De-identification of clinical notes in French: towards a protocol for reference corpus development
- Computer ScienceJ. Biomed. Informatics
- 2014
A Semi-supervised Approach for De-identification of Swedish Clinical Text
- Computer ScienceLREC
- 2020
A semi-supervised method is proposed, for automatically creating high-quality training data, and shows that the method can be used to improve recall from 84.75% to 89.20% without sacrificing precision to the same extent, dropping from 95.73% to 94.20%.
De-identifying free text of Japanese electronic health records
- Computer Science, MedicineJ. Biomed. Semant.
- 2020
The LSTM-based machine learning method was able to extract named entities to be de-identified with better performance, in general, than that of the authors' rule-based methods, however, machine learning methods are inadequate for processing expressions with low occurrence.
The OpenDeID corpus for patient de-identification
- MedicineScientific reports
- 2021
The results suggest that the pre-annotations approach is not reliable in terms of quality when compared to the serial annotations but can drastically reduce annotation time.
Automatic Clinical Text De-Identification: Is It Worth It, and Could It Work for Me?
- MedicineMedInfo
- 2013
This panel will focus on the issues related with the automatic de-identification of clinical text, including an overview of the domain, a demonstration of good examples of such applications in English and in Swedish with their main authors sharing development and adaptation experiences, and a discussion of the HIPAA “Safe Harbor” de-Identification quality and the risk for re-identified data.
Augmenting a De-identification System for Swedish Clinical Text Using Open Resources and Deep Learning
- Computer Science
- 2019
The aim is to compare two machine learning algorithms, Long Short-Term Memory (LSTM) and Conditional Random Fields (CRF) applied to a Swedish clinical data set annotated for de-identification, and shows that CRF performs better than deep learning with LSTM.
Bootstrapping a de-identification system for narrative patient records: Cost-performance tradeoffs
- Computer ScienceInt. J. Medical Informatics
- 2013
A Hybrid Semi-supervised Learning Approach to Identifying Protected Health Information in Electronic Medical Records
- Computer ScienceIMCOM
- 2016
This paper proposes a hybrid semi-supervised learning approach to identifying protected health information (PHI) in electronic medical records that combines a machine learning-based method with a conditional random fields model and a rule- based method in a post-processing phase to handle 8 PHI types with disambiguity.
Influence of Module Order on Rule-Based De-identification of Personal Names in Electronic Patient Records Written in Swedish
- Computer ScienceLREC
- 2010
Four common rules for de-identification of personal names in EPRs written in Swedish are implemented and evaluated and it is shown that to obtain the highest recall and precision, the rules should be applied in the following order.
Building a De-identification System for Real Swedish Clinical Text Using Pseudonymised Clinical Text
- Computer ScienceEMNLP
- 2019
It is concluded that it is possible to train transferable models based on pseudonymised Swedish clinical data, but even small narrative and distributional variation could negatively impact performance.
References
SHOWING 1-10 OF 25 REFERENCES
Developing a standard for de-identifying electronic patient records written in Swedish: Precision, recall and F-measure in a manual and computerized annotation trial
- MedicineInt. J. Medical Informatics
- 2009
Annotating and Recognising Named Entities in Clinical Notes
- Computer ScienceACL
- 2009
A new genre of text which are not well-written, noise prone, ungrammatical and with much cryptic content is introduced, which is a mix of clinical progress notes drawn form an Intensive Care Service and clinical named entities.
Viewpoint Paper: Evaluating the State-of-the-Art in Automatic De-identification
- Computer ScienceJ. Am. Medical Informatics Assoc.
- 2007
An overview of this de-identification challenge is provided, the data and the annotation process are described, the evaluation metrics are explained, the nature of the systems that addressed the challenge are discussed, the results of received system runs are analyzed, and directions for future research are identified.
Automated de-identification of free-text medical records
- MedicineBMC Medical Informatics Decis. Mak.
- 2008
An automated Perl-based de-identification software package that is generally usable on most free-text medical records, e.g., nursing notes, discharge summaries, X-ray reports, etc, and is sufficiently generalized and can be customized to handle text files of any format is described.
Evaluation of a deidentification (De-Id) software engine to share pathology reports and clinical documents for research.
- MedicineAmerican journal of clinical pathology
- 2004
By the end of the evaluation, the system was reliably and specifically removing safe-harbor identifiers and producing highly readable deidentified text without removing important clinical information.
The Stockholm EPR Corpus – Characteristics and Some Initial Findings
- Computer Science
- 2009
The characteristics of the Stockholm Electronic Patient Record Corpus (the SEPR Corpus), an important resource for performing research on clinical data, are described, which contains characteristics that are very interesting from a linguistic point of view, such as domain specific compounds and abbreviations, and various narratives.
Testing Tactics to Localize De-Identification
- Computer ScienceMIE
- 2009
A first gross de-identification step is performed in the hospital for new documents in a language different from English, here French patient reports, and two methods are tested: the first attempts to adapt an existing US de-Identifier for English, the second re-develops a new system which applies the same methods.
Identification of Entity References in Hospital Discharge Letters
- MedicineNODALIDA
- 2007
A system for automatic identification of named entities in Swedish clinical free text, in the form of discharge letters, by applying generic named entity recognition technology with minor adaptations is presented.
Towards a Methodology for Named Entities Annotation
- Computer ScienceLinguistic Annotation Workshop
- 2009
This work identifies the applications using named entity recognition and proposes to semantically define the elements to annotate and put forward a number of methodological recommendations to ensure a coherent and reliable annotation scheme.