• Corpus ID: 227118923

Are Chess Discussions Racist? An Adversarial Hate Speech Data Set

  title={Are Chess Discussions Racist? An Adversarial Hate Speech Data Set},
  author={Rupak Sarkar and Ashiqur R. KhudaBukhsh},
On June 28, 2020, while presenting a chess podcast on Grandmaster Hikaru Nakamura, Antonio Radic's YouTube handle got blocked because it contained "harmful and dangerous" content. YouTube did not give further specific reason, and the channel got reinstated within 24 hours. However, Radic speculated that given the current political situation, a referral to "black against white", albeit in the context of chess, earned him this temporary ban. In this paper, via a substantial corpus of 681,995… 

Tables from this paper

'Beach' to 'Bitch': Inadvertent Unsafe Transcription of Kids' Content on YouTube
Over the last few years, YouTube Kids has emerged as one of the highly competitive alternatives to television for children’s entertainment. Consequently, YouTube Kids’ content should receive an
Hate Speech Criteria: A Modular Approach to Task-Specific Hate Speech Definitions
Hate speech criteria is presented, developed with perspectives from law and social science, with the aim of helping researchers create more precise definitions and annotation guidelines and an overview of the properties of English datasets from hatespeechdata.com that may help select the most suitable dataset for a speci⬁c scenario.


All You Need is "Love": Evading Hate Speech Detection
It is argued that for successful hate speech detection, model architecture is less important than the type of data and labeling criteria, and all proposed detection techniques are brittle against adversaries who can (automatically) insert typos, change word boundaries or add innocuous words to the original hate speech.
Hate Speech Dataset from a White Supremacy Forum
A custom annotation tool has been developed to carry out the manual labelling task which, among other things, allows the annotators to choose whether to read the context of a sentence before labelling it.
Hate Me, Hate Me Not: Hate Speech Detection on Facebook
This work proposes a variety of hate categories and designs and implements two classifiers for the Italian language, based on different learning algorithms: the first based on Support Vector Machines (SVM) and the second on a particular Recurrent Neural Network named Long Short Term Memory (LSTM).
Hate Speech Detection is Not as Easy as You May Think: A Closer Look at Model Validation
Analysis of experimental methodology and results reported by state-of-the-art systems indicate that supervised approaches achieve almost perfect performance but only within specific datasets, which indicates methodological issues, as well as an important dataset bias.
Automated Hate Speech Detection and the Problem of Offensive Language
This work used a crowd-sourced hate speech lexicon to collect tweets containing hate speech keywords and labels a sample of these tweets into three categories: those containinghate speech, only offensive language, and those with neither.
Black is to Criminal as Caucasian is to Police: Detecting and Removing Multiclass Bias in Word Embeddings
This work proposes a method to debias word embeddings in multiclass settings such as race and religion, extending the work of (Bolukbasi et al., 2016) from the binary setting, such as binary gender.
Common Sense Reasoning for Detection, Prevention, and Mitigation of Cyberbullying
An “air traffic control”-like dashboard is proposed, which alerts moderators to large-scale outbreaks that appear to be escalating or spreading and helps them prioritize the current deluge of user complaints.
What about Hate Speech
Im echten (analogen) Leben sind wir eher selten offenen Beleidigungen oder Hass ausgesetzt. Im Internet hingegen lesen wir deutlich häufiger wütende, hasserfüllte Kommentare, Snaps, Tweets und Posts
Word embeddings quantify 100 years of gender and ethnic stereotypes
A framework to demonstrate how the temporal dynamics of the embedding helps to quantify changes in stereotypes and attitudes toward women and ethnic minorities in the 20th and 21st centuries in the United States is developed.
Distributed Representations of Words and Phrases and their Compositionality
This paper presents a simple method for finding phrases in text, and shows that learning good vector representations for millions of phrases is possible and describes a simple alternative to the hierarchical softmax called negative sampling.