Evaluating Gender Bias in Machine Translation

@inproceedings{Stanovsky2019EvaluatingGB,
  title={Evaluating Gender Bias in Machine Translation},
  author={Gabriel Stanovsky and Noah A. Smith and Luke Zettlemoyer},
  booktitle={ACL},
  year={2019}
}
We present the first challenge set and evaluation protocol for the analysis of gender bias in machine translation (MT. [...] Key Method We devise an automatic gender bias evaluation method for eight target languages with grammatical gender, based on morphological analysis (e.g., the use of female inflection for the word "doctor"). Our analyses show that four popular industrial MT systems and two recent state-of-the-art academic MT models are significantly prone to gender-biased translation errors for all…Expand

Figures, Tables, and Topics from this paper

Extending Challenge Sets to Uncover Gender Bias in Machine Translation: Impact of Stereotypical Verbs and Adjectives
TLDR
An extension of the first, and so far only, challenge set, explicitly designed to measure the extent of gender bias in MT systems, with gender-biased adjectives and adds sentences withGender-biased verbs is presented. Expand
Evaluating Gender Bias in Hindi-English Machine Translation
TLDR
This work attempts to evaluate and quantify the gender bias within a Hindi-English machine translation system by implementing a modified version of the existing TGBI metric based on the grammatical considerations for Hindi. Expand
Gender Coreference and Bias Evaluation at WMT 2020
TLDR
This work presents the largest evidence for gender coreference and bias in machine translation in more than 19 systems submitted to the WMT over four diverse target languages: Czech, German, Polish, and Russian. Expand
Gender in Danger? Evaluating Speech Translation Technology on the MuST-SHE Corpus
TLDR
This work presents the first thorough investigation of gender bias in speech translation, contributing with the release of a benchmark useful for future studies, and the comparison of different technologies on two language directions (English-Italian/French). Expand
Mitigating Gender Bias in Machine Translation with Target Gender Annotations
TLDR
It is argued that the information necessary for an adequate translation can not always be deduced from the sentence being translated or even might depend on external knowledge, so a method for training machine translation systems to use word-level annotations containing information about subject’s gender is presented. Expand
How to Split: the Effect of Word Segmentation on Gender Bias in Speech Translation
TLDR
This work considers a model that systematically and disproportionately favours masculine over feminine forms to be biased, as it fails to properly recognize women, and proposes a combined approach that preserves BPE overall translation quality, while leveraging the higher ability of character-based segmentation to properly translate gender. Expand
gENder-IT: An Annotated English-Italian Parallel Challenge Set for Cross-Linguistic Natural Gender Phenomena
Languages differ in terms of the absence or presence of gender features, the number of gender classes and whether and where gender features are explicitly marked. These cross-linguistic differencesExpand
Machine Translationese: Effects of Algorithmic Bias on Linguistic Complexity in Machine Translation
TLDR
It is hypothesize that the ‘algorithmic bias’, i.e. an exacerbation of frequently observed patterns in combination with a loss of less frequent ones, not only exacerbates societal biases present in current datasets but could also lead to an artificially impoverished language: ‘machine translationese’. Expand
First the worst: Finding better gender translations during beam search
TLDR
This work focuses on gender bias resulting from systematic errors in grammatical gender translation, which can lead to human referents being misrepresented or misgendered, and experiments with reranking nbest lists using gender features obtained automatically from the source sentence, and applying gender constraints while decoding to improve nbest list gender diversity. Expand
Towards Mitigating Gender Bias in a decoder-based Neural Machine Translation model by Adding Contextual Information
TLDR
Improvements are shown both in translation quality as well as in gender bias mitigation on WinoMT, implemented in a decoder-based neural MT system. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 28 REFERENCES
Equalizing Gender Biases in Neural Machine Translation with Word Embeddings Techniques
TLDR
This work proposes, experiment and analyzes the integration of two debiasing techniques over GloVe embeddings in the Transformer translation architecture, and shows that the proposed system learns to equalize existing biases from the baseline system. Expand
Gender Bias in Coreference Resolution: Evaluation and Debiasing Methods
TLDR
A data-augmentation approach is demonstrated that, in combination with existing word-embedding debiasing techniques, removes the bias demonstrated by rule-based, feature-rich, and neural coreference systems in WinoBias without significantly affecting their performance on existing datasets. Expand
A Large-Scale Test Set for the Evaluation of Context-Aware Pronoun Translation in Neural Machine Translation
TLDR
This paper presents a test suite of contrastive translations focused specifically on the translation of pronouns and shows that, while gains in BLEU are moderate for those systems, they outperform baselines by a large margin in terms of accuracy on the contrastive test set. Expand
Mind the GAP: A Balanced Corpus of Gendered Ambiguous Pronouns
TLDR
GAP, a gender-balanced labeled corpus of 8,908 ambiguous pronoun–name pairs sampled, is presented and released to provide diverse coverage of challenges posed by real-world text and shows that syntactic structure and continuous neural models provide promising, complementary cues for approaching the challenge. Expand
Evaluating Discourse Phenomena in Neural Machine Translation
TLDR
This article presents hand-crafted, discourse test sets, designed to test the recently proposed multi-encoder NMT models’ ability to exploit previous source and target sentences, and explores a novel way of exploiting context from the previous sentence. Expand
Understanding Back-Translation at Scale
TLDR
This work broadens the understanding of back-translation and investigates a number of methods to generate synthetic source sentences, finding that in all but resource poor settings back-translations obtained via sampling or noised beam outputs are most effective. Expand
Gender Bias in Coreference Resolution
TLDR
A novel, Winograd schema-style set of minimal pair sentences that differ only by pronoun gender are introduced, and systematic gender bias in three publicly-available coreference resolution systems is evaluated and confirmed. Expand
Lipstick on a Pig: Debiasing Methods Cover up Systematic Gender Biases in Word Embeddings But do not Remove Them
Word embeddings are widely used in NLP for a vast range of tasks. It was shown that word embeddings derived from text corpora reflect gender biases in society, causing serious concern. Several recentExpand
A Challenge Set Approach to Evaluating Machine Translation
TLDR
This work presents an English-French challenge set approach to translation evaluation and error analysis, and uses it to analyze phrase-based and neural systems, providing a more fine-grained picture of the strengths of neural systems. Expand
Men Also Like Shopping: Reducing Gender Bias Amplification using Corpus-level Constraints
TLDR
This work proposes to inject corpus-level constraints for calibrating existing structured prediction models and design an algorithm based on Lagrangian relaxation for collective inference to reduce the magnitude of bias amplification in multilabel object classification and visual semantic role labeling. Expand
...
1
2
3
...