• Publications
  • Influence
SemEval-2019 Task 6: Identifying and Categorizing Offensive Language in Social Media (OffensEval)
The results and the main findings of SemEval-2019 Task 6 on Identifying and Categorizing Offensive Language in Social Media (OffensEval), based on a new dataset, contain over 14,000 English tweets, are presented. Expand
Predicting the Type and Target of Offensive Posts in Social Media
The Offensive Language Identification Dataset (OLID), a new dataset with tweets annotated for offensive content using a fine-grained three-layer annotation scheme, is complied and made publicly available. Expand
Benchmarking Aggression Identification in Social Media
The Shared Task on Aggression Identification organised as part of the First Workshop on Trolling, Aggression and Cyberbullying (TRAC - 1) at COLING 2018 was to develop a classifier that could discriminate between Overtly Aggression, Covertly Aggressive, and Non-aggressive texts. Expand
Findings of the 2016 Conference on Machine Translation
This paper presents the results of the WMT16 shared tasks, which included five machine translation (MT) tasks (standard news, IT-domain, biomedical, multimodal, pronoun), three evaluation tasksExpand
SemEval-2020 Task 12: Multilingual Offensive Language Identification in Social Media (OffensEval 2020)
The task included three subtasks corresponding to the hierarchical taxonomy of the OLID schema from OffensEval-2019, and it was offered in five languages: Arabic, Danish, English, Greek, and Turkish. Expand
Findings of the 2019 Conference on Machine Translation (WMT19)
This paper presents the results of the premier shared task organized alongside the Conference on Machine Translation (WMT) 2019. Participants were asked to build machine translation systems for anyExpand
Findings of the VarDial Evaluation Campaign 2017
The VarDial Evaluation Campaign on Natural Language Processing (NLP) for Similar Languages, Varieties and Dialects, which was organized as part of the fourth edition of the VarDial workshop at EACL’2017, is presented. Expand
Offensive Language Identification in Greek
OGTD is a manually annotated dataset containing 4,779 posts from Twitter annotated as offensive and not offensive, and is evaluated by several computational models trained and tested on this data. Expand
Discriminating between Similar Languages and Arabic Dialect Identification: A Report on the Third DSL Shared Task
High-order character n-grams were the most successful feature, and the best classification approaches included traditional supervised learning methods such as SVM, logistic regression, and language models, while deep learning approaches did not perform very well. Expand
Merging Comparable Data Sources for the Discrimination of Similar Languages : The DSL Corpus Collection
The DSL corpus collection were merged from three comparable corpora to provide a suitable dataset for automatic classification to discriminate similar languages and language varieties and results of baseline discrimination experiments reporting performance of up to 87.4% accuracy are presented. Expand