• Corpus ID: 227118923

Are Chess Discussions Racist? An Adversarial Hate Speech Data Set

@article{Sarkar2021AreCD,
  title={Are Chess Discussions Racist? An Adversarial Hate Speech Data Set},
  author={Rupak Sarkar and Ashiqur R. KhudaBukhsh},
  journal={ArXiv},
  year={2021},
  volume={abs/2011.10280}
}
On June 28, 2020, while presenting a chess podcast on Grandmaster Hikaru Nakamura, Antonio Radic's YouTube handle got blocked because it contained "harmful and dangerous" content. YouTube did not give further specific reason, and the channel got reinstated within 24 hours. However, Radic speculated that given the current political situation, a referral to "black against white", albeit in the context of chess, earned him this temporary ban. In this paper, via a substantial corpus of 681,995… 

Tables from this paper

'Beach' to 'Bitch': Inadvertent Unsafe Transcription of Kids' Content on YouTube
Over the last few years, YouTube Kids has emerged as one of the highly competitive alternatives to television for children’s entertainment. Consequently, YouTube Kids’ content should receive an

References

SHOWING 1-10 OF 12 REFERENCES
All You Need is "Love": Evading Hate Speech Detection
TLDR
It is argued that for successful hate speech detection, model architecture is less important than the type of data and labeling criteria, and all proposed detection techniques are brittle against adversaries who can (automatically) insert typos, change word boundaries or add innocuous words to the original hate speech.
Hate Speech Dataset from a White Supremacy Forum
TLDR
A custom annotation tool has been developed to carry out the manual labelling task which, among other things, allows the annotators to choose whether to read the context of a sentence before labelling it.
Hate Me, Hate Me Not: Hate Speech Detection on Facebook
TLDR
This work proposes a variety of hate categories and designs and implements two classifiers for the Italian language, based on different learning algorithms: the first based on Support Vector Machines (SVM) and the second on a particular Recurrent Neural Network named Long Short Term Memory (LSTM).
Hate Speech Detection is Not as Easy as You May Think: A Closer Look at Model Validation
TLDR
Analysis of experimental methodology and results reported by state-of-the-art systems indicate that supervised approaches achieve almost perfect performance but only within specific datasets, which indicates methodological issues, as well as an important dataset bias.
Automated Hate Speech Detection and the Problem of Offensive Language
TLDR
This work used a crowd-sourced hate speech lexicon to collect tweets containing hate speech keywords and labels a sample of these tweets into three categories: those containinghate speech, only offensive language, and those with neither.
Black is to Criminal as Caucasian is to Police: Detecting and Removing Multiclass Bias in Word Embeddings
TLDR
This work proposes a method to debias word embeddings in multiclass settings such as race and religion, extending the work of (Bolukbasi et al., 2016) from the binary setting, such as binary gender.
Common Sense Reasoning for Detection, Prevention, and Mitigation of Cyberbullying
TLDR
An “air traffic control”-like dashboard is proposed, which alerts moderators to large-scale outbreaks that appear to be escalating or spreading and helps them prioritize the current deluge of user complaints.
What about Hate Speech
Im echten (analogen) Leben sind wir eher selten offenen Beleidigungen oder Hass ausgesetzt. Im Internet hingegen lesen wir deutlich häufiger wütende, hasserfüllte Kommentare, Snaps, Tweets und Posts
Word embeddings quantify 100 years of gender and ethnic stereotypes
TLDR
A framework to demonstrate how the temporal dynamics of the embedding helps to quantify changes in stereotypes and attitudes toward women and ethnic minorities in the 20th and 21st centuries in the United States is developed.
Distributed Representations of Words and Phrases and their Compositionality
TLDR
This paper presents a simple method for finding phrases in text, and shows that learning good vector representations for millions of phrases is possible and describes a simple alternative to the hierarchical softmax called negative sampling.
...
1
2
...