Johannes Hellrich

Learn More
Multilingual terminological resources can be drawn from parallel corpora in the languages of interest, possibly exploiting machine translation solutions for term identification. This main objective of the CLEF-ER challenge involves parallel corpora in English and other languages. The challenge organisers have gathered and normalized documents from the(More)
Confidential corpora from the medical, enterprise, security or intelligence domains often contain sensitive raw data which lead to severe restrictions as far as the public accessibility and distribution of such language resources are concerned. The enforcement of strict mechanisms of data protection consitutes a serious barrier for progress in language(More)
Our research aims at tracking the semantic evolution of the lexicon over time. For this purpose, we investigated two well-known training protocols for neural language models in a synchronic experiment and encountered several problems relating to accuracy and reliability. We were able to identify critical parameters for improving the underlying protocols in(More)
The automatic processing of non-English clinical documents is massively hampered by the lack of publicly available medical language resources for training, testing and evaluating NLP components. We suggest sharing statistical models derived from access-protected clinical documents as a reasonable substitute and provide solutions for sentence splitting,(More)
We here describe JuFiT, an easily adjustable rule engine which allows to filter non-natural terms (i.e., ones usually not occurring in running citation texts) from the Umls metathesaurus and even adds new terms to the UMLS (by rewriting non-natural terms). Unlike previous attempts (with MetaMap or Casper), JuFiT serves multilingual purposes in that it runs(More)
Translating huge medical terminologies like SNOMED CT is costly and time consuming. We present a methodology that acquires substring substitution rules for single words, based on the known similarity between medical words and their translations, due to their common Latin / Greek origin. Character translation rules are automatically acquired from pairs of(More)
We here report on efforts to computationally support the maintenance and extension of multilingual biomedical terminology resources. Our main idea is to treat term acquisition as a classification problem guided by term alignment in parallel multilingual corpora, using termhood information coming from of a named entity recognition system as a novel feature.(More)