Detecting East Asian Prejudice on Social Media

  title={Detecting East Asian Prejudice on Social Media},
  author={Bertie Vidgen and Austin Botelho and David A. Broniatowski and Ella Guest and Matthew Hall and Helen Z. Margetts and Rebekah Tromble and Zeerak Waseem and Scott A. Hale},
  booktitle={Workshop on Abusive Language Online},
During COVID-19 concerns have heightened about the spread of aggressive and hateful language online, especially hostility directed against East Asia and East Asian people. We report on a new dataset and the creation of a machine learning classifier that categorizes social media posts from Twitter into four classes: Hostility against East Asia, Criticism of East Asia, Meta-discussions of East Asian prejudice, and a neutral class. The classifier achieves a macro-F1 score of 0.83. We then conduct… 

Figures and Tables from this paper

Predicting Anti-Asian Hateful Users on Twitter during COVID-19

This work applies natural language processing techniques to characterize social media users who began to post anti-Asian hate messages during COVID-19 and shows that it is possible to predict who later publicly posted anti- Asian slurs.

Racism is a virus: anti-asian hate and counterspeech in social media during the COVID-19 crisis

Analysis of the social network reveals that hateful and counterspeech users interact and engage extensively with one another, instead of living in isolated polarized communities, and finds that nodes were highly likely to become hateful after being exposed to hateful content in the year 2020.

"Stop Asian Hate!" : Refining Detection of Anti-Asian Hate Speech During the COVID-19 Pandemic

It is demonstrated that the model developed and an accompanied accompanied training regimen that incorporates agreement between annotators are able to identify hate speech that is systematically missed by established hate speech detectors.

Characterizing Anti-Asian Rhetoric During The COVID-19 Pandemic: A Sentiment Analysis Case Study on Twitter

This work combines and enhances publicly available resources with their own manually annotated set of tweets to create machine learning classification models to characterize the sinophobic behavior and applies this classifier to a pre-filtered longitudinal dataset spanning two years of pandemic related tweets.

Deep-Cov19-Hate: A Textual-Based Novel Approach for Automatic Detection of Hate Speech in Online Social Networks throughout COVID-19 with Shallow and Deep Learning Models

A textual-based study on COVID-19-related hate speech (HS) sharing in online social networks was carried out with Shallow Learning (SL) and Deep Learning (DL) methods and the promising results of all approaches operated in the HSD are forecasted to be chosen in the solution of many other social media and network problems related to CO VID-19.

Multiplex Anti-Asian Sentiment before and during the Pandemic: Introducing New Datasets from Twitter Mining

New datasets from Twitter related to anti-Asian hate sentiment before and during the pandemic are introduced and state-of-the-art hate speech classifiers are used to discern whether these tweets express hatred.

Tackling racial bias in automated online hate detection: Towards fair and accurate detection of hateful users with geometric deep learning

This work demonstrates that adding network information with geometric deep learning produces a more accurate classifier compared with other techniques that either exclude network information entirely or incorporate it through manual feature engineering, and shows that such information also leads to fairer outcomes.

German Abusive Language Dataset with Focus on COVID-19

The contributions are a methodology for collecting abusive language data from Twitter with a substantial amount of abusive and hateful content, and a German abusive language dataset with 4,960 annotated tweets centered on COVID-19, intended to aid researchers in improving abusive language detection.

A new decade for social changes

. The spread of Covid-19 worldwide has been associated with hate and racism speech on social media which sometimes encourages violence and bullying in the different communities. Some officials,

No Rumours Please! A Multi-Indic-Lingual Approach for COVID Fake-Tweet Detection

This work proposes an approach to detect fake news about COVID-19 early on from social media, such as tweets, for multiple Indic-Languages besides English, and establishes the first benchmark for two Indic languages, Hindi and Bengali.



Racism is a virus: anti-asian hate and counterspeech in social media during the COVID-19 crisis

Analysis of the social network reveals that hateful and counterspeech users interact and engage extensively with one another, instead of living in isolated polarized communities, and finds that nodes were highly likely to become hateful after being exposed to hateful content in the year 2020.

Racial Bias in Hate Speech and Abusive Language Detection Datasets

Evidence of systematic racial bias in five different sets of Twitter data annotated for hate speech and abusive language is examined, as classifiers trained on them tend to predict that tweets written in African-American English are abusive at substantially higher rates.

The Risk of Racial Bias in Hate Speech Detection

This work proposes *dialect* and *race priming* as ways to reduce the racial bias in annotation, showing that when annotators are made explicitly aware of an AAE tweet’s dialect they are significantly less likely to label the tweet as offensive.

Hate Speech Dataset from a White Supremacy Forum

A custom annotation tool has been developed to carry out the manual labelling task which, among other things, allows the annotators to choose whether to read the context of a sentence before labelling it.

Pro-Russian Biases in Anti-Chinese Tweets about the Novel Coronavirus

It is found that this corpus as a whole contains pro-Russian attitudes, which are not present in a control Twitter corpus containing general tweets, which may indicate the presence of abusive account activity associated with rapid changes in attitudes around the COVID-19 public health crisis, suggesting potential information operations.

Hateful Symbols or Hateful People? Predictive Features for Hate Speech Detection on Twitter

A list of criteria founded in critical race theory is provided, and these are used to annotate a publicly available corpus of more than 16k tweets and present a dictionary based the most indicative words in the data.

CONAN - COunter NArratives through Nichesourcing: a Multilingual Dataset of Responses to Fight Online Hate Speech

This paper describes the creation of the first large-scale, multilingual, expert-based dataset of hate-speech/counter-narrative pairs, built with the effort of more than 100 operators from three different NGOs that applied their training and expertise to the task.

"Go eat a bat, Chang!": An Early Look on the Emergence of Sinophobic Behavior on Web Communities in the Face of COVID-19

It is found that COVID-19 indeed drives the rise of Sinophobia on the Web and that the dissemination of Sinophobic content is a cross-platform phenomenon: it exists on fringe Web communities like \dspol, and to a lesser extent on mainstream ones like Twitter.

Automated Hate Speech Detection and the Problem of Offensive Language

This work used a crowd-sourced hate speech lexicon to collect tweets containing hate speech keywords and labels a sample of these tweets into three categories: those containinghate speech, only offensive language, and those with neither.

All You Need is "Love": Evading Hate Speech Detection

It is argued that for successful hate speech detection, model architecture is less important than the type of data and labeling criteria, and all proposed detection techniques are brittle against adversaries who can (automatically) insert typos, change word boundaries or add innocuous words to the original hate speech.