Ruddit: Norms of Offensiveness for English Reddit Comments

  title={Ruddit: Norms of Offensiveness for English Reddit Comments},
  author={Rishav Hada and Sohi Sudhir and Pushkar Mishra and Helen Yannakoudakis and Saif M. Mohammad and Ekaterina Shutova},
On social media platforms, hateful and offensive language negatively impact the mental well-being of users and the participation of people from diverse backgrounds. Automatic methods to detect offensive language have largely relied on datasets with categorical labels. However, comments can vary in their degree of offensiveness. We create the first dataset of English language Reddit comments that has fine-grained, real-valued scores between -1 (maximally supportive) and 1 (maximally offensive… 

Figures and Tables from this paper

BERT-based Approach to Arabic Hate Speech and Offensive Language Detection in Twitter: Exploiting Emojis and Sentiment Analysis
  • M. Althobaiti
  • Computer Science
    International Journal of Advanced Computer Science and Applications
  • 2022
BERT-based model gives the best results, surpassing the best benchmark systems in the literature, on all three tasks and the use of sentiment analysis and emojis descriptions as appending features along with the textual content of the tweets is investigated.
CRUSH: Contextually Regularized and User anchored Self-supervised Hate speech Detection
This work introduces CRUSH, a framework for hate speech detection using User Anchored self-supervision and contextual regularization, and secures 1 - 12% improvement in test set metrics over best performing previous approaches on two types of tasks and multiple popular English language social networking datasets.
Just Say No: Analyzing the Stance of Neural Dialogue Generation in Offensive Contexts
The stance of dialogue model responses in offensive Reddit conversations is investigated and the effectiveness of controllable text generation methods to mitigate the tendency of neural dialogue models to agree with offensive comments is quantified.
Analyzing the Intensity of Complaints on Social Media
This work presents the first study in computational linguistics of measuring the intensity of complaints from text, and shows that complaints intensity can be accumulated by computational models with the best mean square error achieving 0.11.
WLV-RIT at GermEval 2021: Multitask Learning with Transformers to Detect Toxic, Engaging, and Fact-Claiming Comments
This paper addresses the identification of toxic, engaging, and fact-claiming comments on social media using large pre-trained transformer models and multitask learning.


HateBERT: Retraining BERT for Abusive Language Detection in English
HateBERT, a re-trained BERT model for abusive language detection in English, is introduced and a battery of experiments comparing the portability of the fine-tuned models across the datasets are discussed, suggesting that portability is affected by compatibility of the annotated phenomena.
SemEval-2019 Task 6: Identifying and Categorizing Offensive Language in Social Media (OffensEval)
The results and the main findings of SemEval-2019 Task 6 on Identifying and Categorizing Offensive Language in Social Media (OffensEval), based on a new dataset, contain over 14,000 English tweets, are presented.
Hateful Symbols or Hateful People? Predictive Features for Hate Speech Detection on Twitter
A list of criteria founded in critical race theory is provided, and these are used to annotate a publicly available corpus of more than 16k tweets and present a dictionary based the most indicative words in the data.
Automated Hate Speech Detection and the Problem of Offensive Language
This work used a crowd-sourced hate speech lexicon to collect tweets containing hate speech keywords and labels a sample of these tweets into three categories: those containinghate speech, only offensive language, and those with neither.
Racial Bias in Hate Speech and Abusive Language Detection Datasets
Evidence of systematic racial bias in five different sets of Twitter data annotated for hate speech and abusive language is examined, as classifiers trained on them tend to predict that tweets written in African-American English are abusive at substantially higher rates.
The Risk of Racial Bias in Hate Speech Detection
This work proposes *dialect* and *race priming* as ways to reduce the racial bias in annotation, showing that when annotators are made explicitly aware of an AAE tweet’s dialect they are significantly less likely to label the tweet as offensive.
Transfer Learning from LDA to BiLSTM-CNN for Offensive Language Detection in Twitter
Transfer learning in general improves offensive language detection and the effect of three different strategies to mitigate negative effects of 'catastrophic forgetting' during transfer learning is investigated.
Reducing Gender Bias in Abusive Language Detection
Three mitigation methods, including debiased word embeddings, gender swap data augmentation, and fine-tuning with a larger corpus, can effectively reduce model bias by 90-98% and can be extended to correct model bias in other scenarios.
Are You a Racist or Am I Seeing Things? Annotator Influence on Hate Speech Detection on Twitter
It is found that amateur annotators are more likely than expert annotators to label items as hate speech, and that systems training on expert annotations outperform systems trained on amateur annotations.
Towards a Comprehensive Taxonomy and Large-Scale Annotated Corpus for Online Slur Usage
This work provides an annotation guide that outlines 4 main categories of online slur usage, which are further divided into a total of 12 sub-categories and presents a publicly available corpus based on this taxonomy, allowing researchers to evaluate classifiers on a wider range of speech containing slurs.