Gender Bias in Coreference Resolution: Evaluation and Debiasing Methods

  title={Gender Bias in Coreference Resolution: Evaluation and Debiasing Methods},
  author={Jieyu Zhao and Tianlu Wang and Mark Yatskar and Vicente Ordonez and Kai-Wei Chang},
  booktitle={North American Chapter of the Association for Computational Linguistics},
In this paper, we introduce a new benchmark for co-reference resolution focused on gender bias, WinoBias. Our corpus contains Winograd-schema style sentences with entities corresponding to people referred by their occupation (e.g. the nurse, the doctor, the carpenter). We demonstrate that a rule-based, a feature-rich, and a neural coreference system all link gendered pronouns to pro-stereotypical entities with higher accuracy than anti-stereotypical entities, by an average difference of 21.1 in… 

Figures and Tables from this paper

Gender Bias in Coreference Resolution

A novel, Winograd schema-style set of minimal pair sentences that differ only by pronoun gender are introduced, and systematic gender bias in three publicly-available coreference resolution systems is evaluated and confirmed.

Gender Coreference and Bias Evaluation at WMT 2020

This work presents the largest evidence for gender coreference and bias in machine translation in more than 19 systems submitted to the WMT over four diverse target languages: Czech, German, Polish, and Russian.

Incorporating Subjectivity into Gendered Ambiguous Pronoun (GAP) Resolution using Style Transfer

A new evaluation dataset for gender bias in coreference resolution, GAP-Subjective, which increases the coverage of the original GAP dataset by including subjective sentences and outlines the methodology used to create this dataset.

Gendered Ambiguous Pronoun (GAP) Shared Task at the Gender Bias in NLP Workshop 2019

This work reviews the approaches of eleven systems with accepted description papers on gendered ambiguous pronoun (GAP) resolution, noting their effective use of BERT, both via fine-tuning and for feature extraction, as well as ensembling.

Gender Bias in Contextualized Word Embeddings

It is shown that a state-of-the-art coreference system that depends on ELMo inherits its bias and demonstrates significant bias on the WinoBias probing corpus and two methods to mitigate such gender bias are explored.

Collecting a Large-Scale Gender Bias Dataset for Coreference Resolution and Machine Translation

Grammatical patterns indicating stereotypical and non-stereotypical gender-role assignments are found in corpora from three domains, resulting in a first large-scale gender bias dataset of 108K diverse real-world English sentences, which lends itself to finetuning a coreference resolution model, finding it mitigates bias on a held out set.

Mind the GAP: A Balanced Corpus of Gendered Ambiguous Pronouns

GAP, a gender-balanced labeled corpus of 8,908 ambiguous pronoun–name pairs sampled, is presented and released to provide diverse coverage of challenges posed by real-world text and shows that syntactic structure and continuous neural models provide promising, complementary cues for approaching the challenge.

Evaluating Gender Bias in Machine Translation

An automatic gender bias evaluation method for eight target languages with grammatical gender, based on morphological analysis is devised, which shows that four popular industrial MT systems and two recent state-of-the-art academic MT models are significantly prone to gender-biased translation errors for all tested target languages.

NeuTral Rewriter: A Rule-Based and Neural Approach to Automatic Rewriting into Gender Neutral Alternatives

This work presents a rule-based and a neural approach to gender-neutral rewriting for English along with manually curated synthetic data (WinoBias+) and natural data (OpenSubtitles and Reddit) benchmarks.

Toward Gender-Inclusive Coreference Resolution: An Analysis of Gender and Bias Throughout the Machine Learning Lifecycle*

It is confirmed that without acknowledging and building systems that recognize the complexity of gender, systems that fail for: quality of service, stereotyping, and over- or under-representation, especially for binary and non-binary trans users.



Gender Bias in Coreference Resolution

A novel, Winograd schema-style set of minimal pair sentences that differ only by pronoun gender are introduced, and systematic gender bias in three publicly-available coreference resolution systems is evaluated and confirmed.

Resolving Complex Cases of Definite Pronouns: The Winograd Schema Challenge

A knowledge-rich approach to the task of resolving complex cases of definite pronouns is employed, which yields a pronoun resolver that outperforms state-of-the-art resolvers by nearly 18 points in accuracy on the authors' dataset.

End-to-end Neural Coreference Resolution

This work introduces the first end-to-end coreference resolution model, trained to maximize the marginal likelihood of gold antecedent spans from coreference clusters and is factored to enable aggressive pruning of potential mentions.

Scoring Coreference Partitions of Predicted Mentions: A Reference Implementation

It is argued that mention manipulation for scoring predicted mentions is unnecessary, and potentially harmful as it could produce unintuitive results, and an open-source, thoroughly-tested reference implementation of the main coreference evaluation measures is made available.

Solving Hard Coreference Problems

This paper presents a general coreference resolution system that significantly improves state-of-the-art performance on hard, Winograd-style, pronoun resolution cases, while still performing at the state of the art level on standard coreferenceresolution datasets.

Easy Victories and Uphill Battles in Coreference Resolution

This work presents a state-of-the-art coreference system that captures various syntactic, discourse, and semantic phenomena implicitly, with a small number of homogeneous feature templates examining shallow properties of mentions, allowing it to win “easy victories” without crafted heuristics.

A Joint Framework for Coreference Resolution and Mention Head Detection

An ILP-based joint coreference resolution and mention head formulation that is shown to yield significant improvements on coreference from raw text, outperforming existing state-ofart systems on both the ACE-2004 and the CoNLL-2012 datasets.

A Multi-Pass Sieve for Coreference Resolution

This work proposes a simple coreference architecture based on a sieve that applies tiers of deterministic coreference models one at a time from highest to lowest precision, and outperforms many state-of-the-art supervised and unsupervised models on several standard corpora.

Bootstrapping Path-Based Pronoun Resolution

This work learns the likelihood of coreference between a pronoun and a candidate noun based on the path in the parse tree between the two entities, and robustly addresses traditional syntactic coreference constraints.

Men Also Like Shopping: Reducing Gender Bias Amplification using Corpus-level Constraints

This work proposes to inject corpus-level constraints for calibrating existing structured prediction models and design an algorithm based on Lagrangian relaxation for collective inference to reduce the magnitude of bias amplification in multilabel object classification and visual semantic role labeling.