Learn More
Multilingual terminological resources can be drawn from parallel corpora in the languages of interest, possibly exploiting machine translation solutions for term identification. This main objective of the CLEF-ER challenge involves parallel corpora in English and other languages. The challenge organisers have gathered and normalized documents from the(More)
We here report on efforts to computationally support the maintenance and extension of multilingual biomedical terminology resources. Our main idea is to treat term acquisition as a classification problem guided by term alignment in parallel multilingual corpora, using termhood information coming from of a named entity recognition system as a novel feature.(More)
Confidential corpora from the medical, enterprise, security or intelligence domains often contain sensitive raw data which lead to severe restrictions as far as the public accessibility and distribution of such language resources are concerned. The enforcement of strict mechanisms of data protection consitutes a serious barrier for progress in language(More)
Our research aims at tracking the semantic evolution of the lexicon over time. For this purpose, we investigated two well-known training protocols for neural language models in a synchronic experiment and encountered several problems relating to accuracy and reliability. We were able to identify critical parameters for improving the underlying protocols in(More)
We assess the reliability and accuracy of (neural) word embeddings for both modern and historical English and German. Our research provides deeper insights into the empirically justified choice of optimal training methods and parameters. The overall low reliability we observe, nevertheless, casts doubt on the suitability of word neighborhoods in embedding(More)
We describe a novel method for measuring affective language in historical texts by expanding an affective lexicon and jointly adapting it to prior language stages. We automatically construct a lexicon for word-emotion association of 18th and 19th century German which is then validated against expert ratings. Subsequently, this resource is used to identify(More)
Translating huge medical terminologies like SNOMED CT is costly and time consuming. We present a methodology that acquires substring substitution rules for single words, based on the known similarity between medical words and their translations, due to their common Latin / Greek origin. Character translation rules are automatically acquired from pairs of(More)