Hate speech detection: Challenges and solutions

@article{MacAvaney2019HateSD,
  title={Hate speech detection: Challenges and solutions},
  author={Sean MacAvaney and Hao-Ren Yao and Eugene Yang and Katina Russell and Nazli Goharian and Ophir Frieder},
  journal={PLoS ONE},
  year={2019},
  volume={14}
}
As online content continues to grow, so does the spread of hate speech. We identify and examine challenges faced by online automatic approaches for hate speech detection in text. Among these difficulties are subtleties in language, differing definitions on what constitutes hate speech, and limitations of data availability for training and testing of these systems. Furthermore, many recent approaches suffer from an interpretability problem—that is, it can be difficult to understand why the… 

Figures and Tables from this paper

Deep Learning for Hate Speech Detection: A Comparative Study
TLDR
A largescale empirical comparison of deep and shallow hate-speech detection methods, mediated through the three most commonly used datasets, to illuminate progress in the area, and identify strengths and weaknesses in the current state of the art.
Unsupervised Domain Adaptation for Hate Speech Detection Using a Data Augmentation Approach
TLDR
This work proposes an unsupervised domain adaptation approach to augment labeled data for hate speech detection, and shows its approach improves Area under the Precision/Recall curve by as much as 42% and recall by asmuch as 278%, with no loss (and in some cases a significant gain) in precision.
Explainability of Hate Speech Detection Models
TLDR
Interpretability techniques empirically demonstrate that — and also illustrate why — social features are indeed the reason for performance gains, and propose the usage of artificially crafted messages to examine the behaviour of models beyond the dataset they trained on.
A Feature Extraction based Model for Hate Speech Identification
TLDR
TU Berlin team experiments and results on the shared task on hate speech and offensive content identification in Indo-European languages 2021 and the transfer learning-based models achieved the best results in both subtasks.
Improving Cross-Domain Hate Speech Detection by Reducing the False Positive Rate
TLDR
An SVM approach is introduced that allows to significantly improve the state-of-the-art results when combined with the deep learning models through a simple majority-voting ensemble, mainly due to a reduction of the false positive rate.
BERT-BU12 Hate Speech Detection Using Bidirectional Encoder-Decoder
TLDR
A novel method of hate speech detection based on the concept of attention networks using the BERT attention model is introduced and it is shown that this model outperforms all the state-of-the-art methods by almost 4%.
Challenges of Hate Speech Detection in Social Media
TLDR
A deep natural language processing (NLP) model—combining convolutional and recurrent layers—for the automatic detection of hate speech in social media data is proposed, and it was shown that by doing so, it was possible to significantly increase the classification score attained.
Meta AI at Arabic Hate Speech 2022: MultiTask Learning with Self-Correction for Hate Speech Classification
TLDR
The solution is an ensemble of models that employs multitask learning and a self-consistency correction method yielding 82.7% on the hate speech subtask—reflecting a 3.4% relative improvement compared to previous work.
Understanding and Interpreting the Impact of User Context in Hate Speech Detection
TLDR
This work reveals that user features play a role in the model’s decision and how they affect the feature space learned by the model, and shows how such techniques can be combined to better understand the model and to detect unintended bias.
Ai for Tackling Hate speech
TLDR
This research-in-progress paper first integrate definitions and concepts to build a detailed understanding of hate speech, and elaborate on the regulatory context and its geographical boundaries for the development of AI systems that tackle hate speech.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 38 REFERENCES
Automated Hate Speech Detection and the Problem of Offensive Language
TLDR
This work used a crowd-sourced hate speech lexicon to collect tweets containing hate speech keywords and labels a sample of these tweets into three categories: those containinghate speech, only offensive language, and those with neither.
Improving Hate Speech Detection with Deep Learning Ensembles
TLDR
An ensemble method is adapted for usage with neural networks and is presented to better classify hate speech, pleased that this method has a nearly 5 point improvement in F-measure when compared to original work on a publicly available hate speech evaluation dataset.
A Survey on Automatic Detection of Hate Speech in Text
TLDR
This survey organizes and describes the current state of the field, providing a structured overview of previous approaches, including core algorithms, methods, and main features used, and provides a unifying definition of hate speech.
Hate Speech Dataset from a White Supremacy Forum
TLDR
A custom annotation tool has been developed to carry out the manual labelling task which, among other things, allows the annotators to choose whether to read the context of a sentence before labelling it.
Measuring the Reliability of Hate Speech Annotations: The Case of the European Refugee Crisis
TLDR
It is concluded that the presence of hate speech should perhaps not be considered a binary yes-or-no decision, and raters need more detailed instructions for the annotation, which was very low overall.
Hateful Symbols or Hateful People? Predictive Features for Hate Speech Detection on Twitter
TLDR
A list of criteria founded in critical race theory is provided, and these are used to annotate a publicly available corpus of more than 16k tweets and present a dictionary based the most indicative words in the data.
Are You a Racist or Am I Seeing Things? Annotator Influence on Hate Speech Detection on Twitter
TLDR
It is found that amateur annotators are more likely than expert annotators to label items as hate speech, and that systems training on expert annotations outperform systems trained on amateur annotations.
Cyberbullying Detection Task: the EBSI-LIA-UNAM System (ELU) at COLING’18 TRAC-1
TLDR
This study aims to assess the ability that both classical and state-of-the-art vector space modeling methods provide to well known learning machines to identify aggression levels in social network cyberbullying.
Detecting Hate Speech on Twitter Using a Convolution-GRU Based Deep Neural Network
TLDR
This paper introduces a new method based on a deep neural network combining convolutional and gated recurrent networks that is able to capture both word sequence and order information in short texts and sets new benchmark by outperforming on 6 out of 7 datasets by between 1 and 13% in F1.
Is Attention Interpretable?
TLDR
While attention noisily predicts input components’ overall importance to a model, it is by no means a fail-safe indicator, and there are many ways in which this does not hold, where gradient-based rankings of attention weights better predict their effects than their magnitudes.
...
1
2
3
4
...