• Corpus ID: 244345586

Automatic Expansion and Retargeting of Arabic Offensive Language Training

  title={Automatic Expansion and Retargeting of Arabic Offensive Language Training},
  author={Hamdy Mubarak and Ahmed Abdelali and Kareem Darwish and Younes Samih},
Rampant use of offensive language on social media led to recent efforts on automatic identification of such language. Though offensive language has general characteristics, attacks on specific entities may exhibit distinct phenomena such as malicious alterations in the spelling of names. In this paper, we present a method for identifying entity specific offensive language. We employ two key insights, namely that replies on Twitter often imply opposition and some accounts are persistent in their… 

Figures and Tables from this paper


Arabic Offensive Language Classification on Twitter
This paper shows that it can rapidly build a training set using a seed list of offensive words and trained a character n-gram based deep learning classifier that can effectively classify tweets with F1 score of 90%.
Overview of OSACT4 Arabic Offensive Language Detection Shared Task
An overview of the offensive language detection shared task at the 4th workshop on Open-Source Arabic Corpora and Processing Tools (OSACT4), involving the detection of hate speech, is provided.
Automated Hate Speech Detection and the Problem of Offensive Language
This work used a crowd-sourced hate speech lexicon to collect tweets containing hate speech keywords and labels a sample of these tweets into three categories: those containinghate speech, only offensive language, and those with neither.
Abusive Language Detection in Online User Content
A machine learning based method to detect hate speech on online user comments from two domains which outperforms a state-of-the-art deep learning approach and a corpus of user comments annotated for abusive language, the first of its kind.
Deep Learning for Hate Speech Detection in Tweets
These experiments on a benchmark dataset of 16K annotated tweets show that such deep learning methods outperform state-of-the-art char/word n-gram methods by ~18 F1 points.
Are they Our Brothers? Analysis and Detection of Religious Hate Speech in the Arabic Twittersphere
This work created the first publicly available Arabic dataset annotated for the task of religious hate speech detection and the first Arabic lexicon consisting of terms commonly found in religious discussions along with scores representing their polarity and strength and developed various classification models using lexicon- based, n-gram-based, and deep-learning-based approaches.
Detecting Hate Speech in Social Media
This paper aims to establish lexical baselines for this task by applying supervised classification methods using a recently released dataset annotated for this purpose, and obtains results of 78% accuracy in identifying posts across three classes.
Hate Speech Detection with Comment Embeddings
This work proposes to learn distributed low-dimensional representations of comments using recently proposed neural language models, that can then be fed as inputs to a classification algorithm, resulting in highly efficient and effective hate speech detectors.
Deep Learning for Detecting Cyberbullying Across Multiple Social Media Platforms
This is the first work that systematically analyzes cyberbullying detection on various topics across multiple SMPs using deep learning based models and transfer learning.