Detecting Abusive Albanian
@article{Nurce2021DetectingAA, title={Detecting Abusive Albanian}, author={Erida Nurce and Jorgel Keci and Leon Derczynski}, journal={ArXiv}, year={2021}, volume={abs/2107.13592} }
The ever growing usage of social media in the recent years has had a direct impact on the increased presence of hate speech and offensive speech in online platforms. Research on effective detection of such content has mainly fo-cused on English and a few other widespread languages, while the leftover majority fail to have the same work put into them and thus cannot benefit from the steady advancements made in the field. In this paper we present Shaj , an annotated Albanian dataset for hate…
2 Citations
Hate Speech Classification in Bulgarian
- Computer ScienceCLIB
- 2022
This work aggregated a real-world dataset from Bulgarian online forums and manually annotated 108,142 sentences and developed and evaluated various classifiers on the dataset and found that a support vector machine with a linear kernel trained on character-level TF-IDF features is the best model.
Human-annotated dataset for social media sentiment analysis for Albanian language
- Computer ScienceData in brief
- 2022
References
SHOWING 1-10 OF 33 REFERENCES
Offensive Language and Hate Speech Detection for Danish
- Computer ScienceLREC
- 2020
This work constructs a Danish dataset DKhate containing user-generated comments from various social media platforms, and to the authors' knowledge, the first of its kind, annotated for various types and target of offensive language, and develops four automatic classification systems designed to work for both the English and the Danish language.
Hate Speech Dataset from a White Supremacy Forum
- Computer ScienceALW
- 2018
A custom annotation tool has been developed to carry out the manual labelling task which, among other things, allows the annotators to choose whether to read the context of a sentence before labelling it.
A Corpus of Turkish Offensive Language on Social Media
- Computer ScienceLREC
- 2020
Annotation guidelines are based on a careful review of the annotation practices of recent efforts for other languages, and results of automatically classifying the corpus using state-of-the-art text classification methods are presented.
Automated Hate Speech Detection and the Problem of Offensive Language
- Computer ScienceICWSM
- 2017
This work used a crowd-sourced hate speech lexicon to collect tweets containing hate speech keywords and labels a sample of these tweets into three categories: those containinghate speech, only offensive language, and those with neither.
Social Network Hate Speech Detection for Amharic Language
- Computer Science
- 2018
An apache spark based model to classify Amharic Facebook posts and comments into hate and not hate is developed and achieves a promising result with unique feature of spark for big data.
Detecting Hate Speech on the World Wide Web
- Computer Science
- 2012
The definition of hate speech, the collection and annotation of the hate speech corpus, and a mechanism for detecting some commonly used methods of evading common "dirty word" filters are described.
SemEval-2019 Task 6: Identifying and Categorizing Offensive Language in Social Media (OffensEval)
- Psychology, Computer Science*SEMEVAL
- 2019
The results and the main findings of SemEval-2019 Task 6 on Identifying and Categorizing Offensive Language in Social Media (OffensEval), based on a new dataset, contain over 14,000 English tweets, are presented.
Hateful Symbols or Hateful People? Predictive Features for Hate Speech Detection on Twitter
- Computer ScienceNAACL
- 2016
A list of criteria founded in critical race theory is provided, and these are used to annotate a publicly available corpus of more than 16k tweets and present a dictionary based the most indicative words in the data.
Are You a Racist or Am I Seeing Things? Annotator Influence on Hate Speech Detection on Twitter
- Computer ScienceNLP+CSS@EMNLP
- 2016
It is found that amateur annotators are more likely than expert annotators to label items as hate speech, and that systems training on expert annotations outperform systems trained on amateur annotations.
Multilingual Detection of Hate Speech Against Immigrants and Women in Twitter at SemEval-2019 Task 5: Frequency Analysis Interpolation for Hate in Speech Detection
- Computer ScienceSemEval@NAACL-HLT
- 2019
This document describes a text change of representation approach to the task of Multilingual Detection of Hate Speech Against Immigrants and Women in Twitter, as part of SemEval-2019 1 . The task is…