Ceasing hate withMoH: Hate Speech Detection in Hindi-English Code-Switched Language

  title={Ceasing hate withMoH: Hate Speech Detection in Hindi-English Code-Switched Language},
  author={Arushi Sharma and Anubha Kabra and Minni Jain},
  journal={Inf. Process. Manag.},

Homophobic and Hate Speech Detection Using Multilingual-BERT Model on Turkish Social Media

A pre-trained Multilingual Bidirectional Encoder Representations from Transformers (M-BERT) model that can successfully detect whether Turkish comments on social media contain homophobic or related hate comments and increase the effectiveness of filters in detecting Turkish homophobic and related hate speech in social networks is presented.

Novel Hate Speech Detection Using Word Cloud Visualization and Ensemble Learning Coupled with Count Vectorizer

A computational framework is presented to examine out the computational challenges behind hate speech detection and generate high performance results, including Random Forest has surpassed other methods by generating 95% using accuracy performance results and word cloud displays the most prominent tweets that are responsible for hateful sentiments.

A New Corpus and Lexicon for Offensive Tamazight Language Detection

This paper addresses the offensive language detection on Tamazight language, which is one of the under-resourced languages that is still in their infancy and lack of standard orthography, and proposes a new corpus of offensive Tamzight language and a new lexicon of offensive and abusive TamZight words.

Detecting racism and xenophobia using deep learning models on Twitter data: CNN, LSTM and BERT

This work aimed to develop a predictive model based on BERT, capable of detecting racist and xenophobic messages in tweets written in Spanish, and found that the one that got the best metrics was BETO, a BERT-based model trained only with texts written inSpanish.

A Transformer Based Approach for Abuse Detection in Code Mixed Indic Languages.

Experimental analysis of four state-of-the-art transformer-based models vis à vis XLM-RoBERTa, indic-BERT, MurilBert and mberT, out of which XLM Roberta with BiGRU outperforms, supports the fact that combined model fits the data better possibly due to its code-mixed nature.



Hindi-English Hate Speech Detection: Author Profiling, Debiasing, and Practical Perspectives

A three-tier pipeline that employs profanity modeling, deep graph embeddings, and author profiling to retrieve instances of hate speech in Hindi-English code-switched language (Hinglish) on social media platforms like Twitter is introduced.

Hate Speech Detection in Hindi-English Code-Mixed Social Media Text

This paper deals with the task of identification of hate speech from code-mixed social media text using two architectures namely sub-word level LSTM model and Hierarchical L STM model with attention based on phonemic sub-words.

Hate Me, Hate Me Not: Hate Speech Detection on Facebook

This work proposes a variety of hate categories and designs and implements two classifiers for the Italian language, based on different learning algorithms: the first based on Support Vector Machines (SVM) and the second on a particular Recurrent Neural Network named Long Short Term Memory (LSTM).

A BERT-Based Transfer Learning Approach for Hate Speech Detection in Online Social Media

This study introduces a novel transfer learning approach based on an existing pre-trained language model called BERT (Bidirectional Encoder Representations from Transformers) and investigates the ability of BERT at capturing hateful context within social media content by using new fine-tuning methods based on transfer learning.

A Dataset of Hindi-English Code-Mixed Social Media Text for Hate Speech Detection

This work presents a Hindi-English code-mixed dataset consisting of tweets posted online on Twitter and proposes a supervised classification system for detecting hate speech in the text using various character level, word level, and lexicon based features.

Did you offend me? Classification of Offensive Tweets in Hinglish Language

The proposed MIMCT model outperforms the baseline supervised classification models, transfer learning based CNN and LSTM models to establish itself as the state of the art in the unexplored domain of Hinglish offensive text classification.

Hate Speech Detection: A Solved Problem? The Challenging Case of Long Tail on Twitter

This work proposes novel Deep Neural Network structures serving as effective feature extractors, and explores the usage of background information in the form of different word embeddings pre-trained from unlabelled corpora to address the very challenging nature of identifying hate speech on the social media.

Automated Hate Speech Detection and the Problem of Offensive Language

This work used a crowd-sourced hate speech lexicon to collect tweets containing hate speech keywords and labels a sample of these tweets into three categories: those containinghate speech, only offensive language, and those with neither.

Hateful Symbols or Hateful People? Predictive Features for Hate Speech Detection on Twitter

A list of criteria founded in critical race theory is provided, and these are used to annotate a publicly available corpus of more than 16k tweets and present a dictionary based the most indicative words in the data.

Code-Mixing in Social Media Text. The Last Language Identification Frontier?

An initial study to understand the characteristics of code-mixing in the social media context and a system developed to automatically detect language boundaries in code-mixed social media text, exemplified by Facebook messages in mixed English-Bengali and English-Hindi are reported.