Exploring Hate Speech Detection with HateXplain and BERT

@article{Subramaniam2022ExploringHS,
  title={Exploring Hate Speech Detection with HateXplain and BERT},
  author={Arvind Subramaniam and Aryan Mehra and Sayani Kundu},
  journal={ArXiv},
  year={2022},
  volume={abs/2208.04489}
}
Hate Speech takes many forms to target communities with derogatory comments, and takes humanity a step back in societal progress. HateXplain is a recently pub-lished and first dataset to use annotated spans in the form of ’rationales’, along with speech classification categories and targeted communities to make the classification more human-like, explainable, ac-curate and less biased. We tune BERT to perform this task in the form of rationales and class prediction, and compare our performance on… 

Figures and Tables from this paper

References

SHOWING 1-10 OF 13 REFERENCES

HateXplain: A Benchmark Dataset for Explainable Hate Speech Detection

HateXplain is introduced, the first benchmark hate speech dataset covering multiple aspects of the issue and utilizes existing state-of-the-art models, observing that models, which utilize the human rationales for training, perform better in reducing unintended bias towards target communities.

How Hateful are Movies? A Study and Prediction on Movie Subtitles

This research introduces a new dataset collected from the subtitles of six movies, where each utterance is annotated either as hate, offensive or normal, and shows that transfer learning from the social media domain is efficacious in classifying hate and offensive speech in movies through subtitles.

Lone Pine at SemEval-2021 Task 5: Fine-Grained Detection of Hate Speech Using BERToxic

BERToxic, a system that fine-tunes a pre-trained BERT model to locate toxic text spans in a given text and utilizes additional post-processing steps to refine the boundaries, is proposed.

WLV-RIT at SemEval-2021 Task 5: A Neural Transformer Framework for Detecting Toxic Spans

This paper presents the WLV-RIT entry for the SemEval-2021 Task 5: Toxic Spans Detection competition, and develops an open-source framework for multilingual detection of offensive spans, i.e., MUDES, based on neural transformers that detect toxic spans in texts.

Automated Hate Speech Detection and the Problem of Offensive Language

This work used a crowd-sourced hate speech lexicon to collect tweets containing hate speech keywords and labels a sample of these tweets into three categories: those containinghate speech, only offensive language, and those with neither.

Characterizing and Detecting Hateful Users on Twitter

This work obtains a sample of Twitter's retweet graph and annotates 4,972 users as hateful or normal, and finds that a node embedding algorithm outperforms content-based approaches for detecting both hateful and suspended users.

ltl.uni-due at SemEval-2019 Task 5: Simple but Effective Lexico-Semantic Features for Detecting Hate Speech in Twitter

This paper compares different configurations of shallow and deep learning approaches on the English data and uses the system that performs best in both sub-tasks to create a SVM-based system with lexicosemantic features that beats the baseline system.

ERASER: A Benchmark to Evaluate Rationalized NLP Models

This work proposes the Evaluating Rationales And Simple English Reasoning (ERASER) a benchmark to advance research on interpretable models in NLP, and proposes several metrics that aim to capture how well the rationales provided by models align with human rationales, and also how faithful these rationales are.

XLNet: Generalized Autoregressive Pretraining for Language Understanding

XLNet is proposed, a generalized autoregressive pretraining method that enables learning bidirectional contexts by maximizing the expected likelihood over all permutations of the factorization order and overcomes the limitations of BERT thanks to its autore progressive formulation.

Nuanced Metrics for Measuring Unintended Bias with Real Data for Text Classification

A suite of threshold-agnostic metrics are introduced that provide a nuanced view of unintended bias in Machine Learning, by considering the various ways that a classifier’s score distribution can vary across designated groups.