Egon W. Stemle

Learn More
This article describes the system that participated in the Part-of-speech tagging subtask of the EmpiriST 2015 shared task on automatic linguistic annotation of computer-mediated communication / social media. The system combines a small assertion of trending techniques, which implement matured methods, from NLP and ML to achieve competitive results on PoS(More)
English. The DiDi corpus of South Ty-rolean data of computer-mediated communication (CMC) is a multilingual so-ciolinguistic language corpus. It consists of around 600,000 tokens collected from 136 profiles of Facebook users residing in South Tyrol, Italy. In conformity with the multilingual situation of the territory, the main languages of the corpus are(More)
English. This article describes the system that participated in the POS tagging for Italian Social Media Texts (PoST-WITA) task of the 5 th periodic evaluation campaign of Natural Language Processing (NLP) and speech tools for the Italian language EVALITA 2016. The work is a continuation of Stemle (2016) with minor modifications to the system and different(More)
Developing content extraction methods for Humanities domains raises a number of challenges , from the abundance of non-standard entity types to their complexity to the scarcity of data. Close collaboration with Humanities scholars is essential to address these challenges. We discuss an annotation schema for Archaeological texts developed in collaboration(More)
  • 1