GGPONC: A Corpus of German Medical Text with Rich Metadata Based on Clinical Practice Guidelines

  title={GGPONC: A Corpus of German Medical Text with Rich Metadata Based on Clinical Practice Guidelines},
  author={Florian Borchert and Christina Lohr and Luise Modersohn and T. Langer and M. Follmann and J. Sachs and U. Hahn and M. Schapranow},
The lack of publicly accessible text corpora is a major obstacle for progress in natural language processing. For medical applications, unfortunately, all language communities other than English are low-resourced. In this work, we present GGPONC (German Guideline Program in Oncology NLP Corpus), a freely dis tributable German language corpus based on clinical practice guidelines for oncology. This corpus is one of the largest ever built from German medical documents. Unlike clinical documents… Expand


Combining open-source natural language processing tools to parse clinical practice guidelines
The results of this paper show that with some adaptation, open-source NLP tools can be retargeted for new tasks, providing an accuracy that is equivalent to the methods designed for specific tasks. Expand
Knowledge-based best of breed approach for automated detection of clinical events based on German free text digital hospital discharge letters
A knowledge-based best of breed approach, combining a terminology server with integrated ontology, a NLP pipeline and a rules engine for the purpose of automated detection of clinical events such as drug-disease interactions from free text digital hospital discharge letters was effective. Expand
A fine-grained corpus annotation schema of German nephrology records
A fine-grained annotation schema to detect named entities in German clinical data of chronically ill patients with kidney diseases and a semi-automatic annotation which uses additional sources of knowledge such as UMLS, to pre-annotate concepts in advance are presented. Expand
Quantitative analysis of manual annotation of clinical text samples
The overall low IAA results pose a challenge for interoperability and indicate the need for further research to assess whether consistent terminology implementation is possible across Europe, e.g., improving term coverage by adding localized versions of the selected terminologies, analysing causes of low inter-annotator agreement, and improving tooling and guidance for annotators. Expand
Sharing Copies of Synthetic Clinical Corpora without Physical Distribution — A Case Study to Get Around IPRs and Privacy Constraints Featuring the German JSYNCC Corpus
This work introduces a novel approach for the creation and re-use of clinical corpora which is based on a two-step workflow and presents JSYNCC, the largest and, even more importantly, first publicly available, corpus of German clinical language. Expand
Unsupervised Abbreviation Detection in Clinical Narratives
The results are promising for a domain-independent abbreviation detection strategy, because the approach avoids retraining of models or use case specific feature engineering efforts required for supervised machine learning approaches. Expand
Semi-Automatic Terminology Generation for Information Extraction from German Chest X-Ray Reports
This paper reports on an algorithm for the first step of semi-automatic generation of the local terminology and evaluates the algorithm with radiology reports of chest X-ray examinations from Würzburg university hospital. Expand
Annotating German Clinical Documents for De-Identification
This work devised annotation guidelines for the de-identification of German clinical documents and assembled a corpus of 1,106 discharge summaries and transfer letters with 44K annotated protected health information (PHI) items and trained a recurrent neural network. Expand
3000PA - Towards a National Reference Corpus of German Clinical Language
We introduce 3000PA, a clinical document corpus composed of 3,000 EPRs from three different clinical sites, which will serve as the backbone of a national reference language resource for GermanExpand
Semi-Automatic Mark-Up and UMLS Annotation of Clinical Guidelines
A semi-automated mark-up and UMLS annotation for clinical guidelines by using natural language processing techniques is proposed and has been tested and evaluated using a German breast cancer guideline. Expand