Reducing Gender Bias in Abusive Language Detection

@inproceedings{Park2018ReducingGB,
  title={Reducing Gender Bias in Abusive Language Detection},
  author={Ji Ho Park and Jamin Shin and Pascale Fung},
  booktitle={EMNLP},
  year={2018}
}
Abusive language detection models tend to have a problem of being biased toward identity words of a certain group of people because of imbalanced training datasets. For example, “You are a good woman” was considered “sexist” when trained on an existing dataset. Such model bias is an obstacle for models to be robust enough for practical use. In this work, we measure them on models trained with different datasets, while analyzing the effect of different pre-trained word embeddings and model… Expand
Investigating Sampling Bias in Abusive Language Detection
TLDR
This work reproduces the investigation of Wiegand et al. (2019) to determine differences between different sampling strategies, and shows that differences in the textual source can have more effect than the chosen sampling strategy. Expand
Evaluating Gender Bias in Natural Language Inference
TLDR
This work proposes an evaluation methodology to measure genderbias stereotypes by constructing a challenge task which involves pairing gender neutral premise against gender-specific hypothesis and suggests that three models trained on MNLI and SNLI data-sets are significantly prone to genderinduced prediction errors. Expand
Racial Bias in Hate Speech and Abusive Language Detection Datasets
TLDR
Evidence of systematic racial bias in five different sets of Twitter data annotated for hate speech and abusive language is examined, as classifiers trained on them tend to predict that tweets written in African-American English are abusive at substantially higher rates. Expand
Mitigating Gender Bias in Natural Language Processing: Literature Review
TLDR
This paper discusses gender bias based on four forms of representation bias and analyzes methods recognizing gender bias in NLP, and discusses the advantages and drawbacks of existing gender debiasing methods. Expand
Intersectional Bias in Hate Speech and Abusive Language Datasets
TLDR
This study provides the first systematic evidence on intersectional bias in datasets of hate speech and abusive language in social media using a publicly available annotated Twitter dataset. Expand
Mitigating Political Bias in Language Models Through Reinforced Calibration
Current large-scale language models can be politically biased as a result of the data they are trained on, potentially causing serious problems when they are deployed in realworld settings. In thisExpand
Bias and comparison framework for abusive language datasets
Recently, numerous datasets have been produced as research activities in the field of automatic detection of abusive language or hate speech have increased. A problem with this diversity is that theyExpand
Detecting Gender Stereotypes: Lexicon vs. Supervised Learning Methods
TLDR
This paper reexamine the role of gender stereotype detection in the context of modern tools, by comparatively analyzing efficacy of lexicon-based approaches and end-to-end, ML- based approaches prevalent in state-of-the-art natural language processing systems. Expand
Gender Bias in Contextualized Word Embeddings
TLDR
It is shown that a state-of-the-art coreference system that depends on ELMo inherits its bias and demonstrates significant bias on the WinoBias probing corpus and two methods to mitigate such gender bias are explored. Expand
The Risk of Racial Bias in Hate Speech Detection
TLDR
This work proposes *dialect* and *race priming* as ways to reduce the racial bias in annotation, showing that when annotators are made explicitly aware of an AAE tweet’s dialect they are significantly less likely to label the tweet as offensive. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 22 REFERENCES
Examining Gender and Race Bias in Two Hundred Sentiment Analysis Systems
TLDR
The Equity Evaluation Corpus (EEC) is presented, which consists of 8,640 English sentences carefully chosen to tease out biases towards certain races and genders, and it is found that several of the systems show statistically significant bias. Expand
Gender Bias in Coreference Resolution: Evaluation and Debiasing Methods
TLDR
A data-augmentation approach is demonstrated that, in combination with existing word-embedding debiasing techniques, removes the bias demonstrated by rule-based, feature-rich, and neural coreference systems in WinoBias without significantly affecting their performance on existing datasets. Expand
Measuring and Mitigating Unintended Bias in Text Classification
TLDR
A new approach to measuring and mitigating unintended bias in machine learning models is introduced, using a set of common demographic identity terms as the subset of input features on which to measure bias. Expand
Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings
TLDR
This work empirically demonstrates that its algorithms significantly reduce gender bias in embeddings while preserving the its useful properties such as the ability to cluster related concepts and to solve analogy tasks. Expand
Deeper Attention to Abusive User Content Moderation
TLDR
A novel, deep, classificationspecific attention mechanism improves the performance of the RNN further, and can also highlight suspicious words for free, without including highlighted words in the training data. Expand
Large Scale Crowdsourcing and Characterization of Twitter Abusive Behavior
TLDR
This work proposes an incremental and iterative methodology, that utilizes the power of crowdsourcing to annotate a large scale collection of tweets with a set of abuse-related labels, and identifies a reduced but robust set of labels. Expand
One-step and Two-step Classification for Abusive Language Detection on Twitter
TLDR
This research explores a two- step approach of performing classification on abusive language and then classifying into specific types and compares it with one-step approach of doing one multi-class classification for detecting sexist and racist languages. Expand
Deep Learning for Hate Speech Detection in Tweets
TLDR
These experiments on a benchmark dataset of 16K annotated tweets show that such deep learning methods outperform state-of-the-art char/word n-gram methods by ~18 F1 points. Expand
Mitigating Unwanted Biases with Adversarial Learning
TLDR
This work presents a framework for mitigating biases concerning demographic groups by including a variable for the group of interest and simultaneously learning a predictor and an adversary, which results in accurate predictions that exhibit less evidence of stereotyping Z. Expand
Hateful Symbols or Hateful People? Predictive Features for Hate Speech Detection on Twitter
TLDR
A list of criteria founded in critical race theory is provided, and these are used to annotate a publicly available corpus of more than 16k tweets and present a dictionary based the most indicative words in the data. Expand
...
1
2
3
...