• Corpus ID: 231924754

They, Them, Theirs: Rewriting with Gender-Neutral English

  title={They, Them, Theirs: Rewriting with Gender-Neutral English},
  author={Tony Sun and Kellie Webster and Apurva Shah and William Yang Wang and Melvin Johnson},
Responsible development of technology involves applications being inclusive of the diverse set of users they hope to support. An important part of this is understanding the many ways to refer to a person and being able to fluently change between the different forms as needed. We perform a case study on the singular they, a common way to promote gender inclusion in English. We define a rewriting task, create an evaluation benchmark, and show how a model can be trained to produce gender-neutral… 

Figures and Tables from this paper

NeuTral Rewriter: A Rule-Based and Neural Approach to Automatic Rewriting into Gender Neutral Alternatives

This work presents a rule-based and a neural approach to gender-neutral rewriting for English along with manually curated synthetic data (WinoBias+) and natural data (OpenSubtitles and Reddit) benchmarks.

User-Centric Gender Rewriting

A multi-step system that combines the positive aspects of both rule-based and neural rewriting models for gender rewriting in contexts involving two users – first and second grammatical persons with independent grammatical gender preferences is developed.

How Conservative are Language Models? Adapting to the Introduction of Gender-Neutral Pronouns

Gender-neutral pronouns have recently been introduced in many languages to a) include non-binary people and b) as a generic singular. Recent results from psycholinguistics suggest that gender-neutral

Under the Morphosyntactic Lens: A Multifaceted Evaluation of Gender Bias in Speech Translation

Gender bias is largely recognized as a problematic phenomenon affecting language technologies, with recent studies underscoring that it might surface differently across languages. However, most of

Supporting Gender-Neutral Writing in German

The avoidance of the generic masculine is an important part of a gender-neutral use of the German language. This paper presents a rule-based natural language processing system that identifies

The Arabic Parallel Gender Corpus 2.0: Extensions and Analyses

A new corpus for gender identification and rewriting in contexts involving one or two target users (I and/or You) – first and second grammatical persons with independent grammatical gender preferences in Arabic, a gender-marking morphologically rich language.

Gender Bias in Machine Translation

This work critically review current conceptualizations of bias in machine translation technology in light of theoretical insights from related disciplines and point toward potential directions for future work.

First the Worst: Finding Better Gender Translations During Beam Search

This work constrain beam search to improve gender diversity in n-best lists, and rerank n- best lists using gender features obtained from the source sentence, and demonstrates its utility for consistently gendering named entities, and its flexibility to handle new gendered language beyond the binary.

A Survey on Gender Bias in Natural Language Processing

A survey of 304 papers onGender bias in natural language processing finds that research on gender bias suffers from four core limitations and sees overcoming these limitations as a necessary development in future research.

Challenges in Measuring Bias via Open-Ended Language Generation

It is found that the practice of measuring biases through text completion is prone to yielding contradicting results under different experiment settings, and recommendations for reporting biases in open-ended language generation are provided for a more complete outlook of biases exhibited by a given language model.



Gender Bias in Coreference Resolution: Evaluation and Debiasing Methods

A data-augmentation approach is demonstrated that, in combination with existing word-embedding debiasing techniques, removes the bias demonstrated by rule-based, feature-rich, and neural coreference systems in WinoBias without significantly affecting their performance on existing datasets.

Mind the GAP: A Balanced Corpus of Gendered Ambiguous Pronouns

GAP, a gender-balanced labeled corpus of 8,908 ambiguous pronoun–name pairs sampled, is presented and released to provide diverse coverage of challenges posed by real-world text and shows that syntactic structure and continuous neural models provide promising, complementary cues for approaching the challenge.

The Woman Worked as a Babysitter: On Biases in Language Generation

The notion of the regard towards a demographic is introduced, the varying levels of regard towards different demographics are used as a defining metric for bias in NLG, and the extent to which sentiment scores are a relevant proxy metric for regard is analyzed.

Mitigating Gender Bias in Natural Language Processing: Literature Review

This paper discusses gender bias based on four forms of representation bias and analyzes methods recognizing gender bias in NLP, and discusses the advantages and drawbacks of existing gender debiasing methods.

Lipstick on a Pig: Debiasing Methods Cover up Systematic Gender Biases in Word Embeddings But do not Remove Them

Word embeddings are widely used in NLP for a vast range of tasks. It was shown that word embeddings derived from text corpora reflect gender biases in society, causing serious concern. Several recent

StereoSet: Measuring stereotypical bias in pretrained language models

StereoSet, a large-scale natural English dataset to measure stereotypical biases in four domains: gender, profession, race, and religion, is presented and it is shown that popular models like BERT, GPT-2, RoBERTa, and XLnet exhibit strong stereotypical biases.

Language (Technology) is Power: A Critical Survey of “Bias” in NLP

A greater recognition of the relationships between language and social hierarchies is urged, encouraging researchers and practitioners to articulate their conceptualizations of “bias” and to center work around the lived experiences of members of communities affected by NLP systems.

Bleu: a Method for Automatic Evaluation of Machine Translation

This work proposes a method of automatic machine translation evaluation that is quick, inexpensive, and language-independent, that correlates highly with human evaluation, and that has little marginal cost per run.

Gender Bias in Coreference Resolution

A novel, Winograd schema-style set of minimal pair sentences that differ only by pronoun gender are introduced, and systematic gender bias in three publicly-available coreference resolution systems is evaluated and confirmed.

Sequence to Sequence Learning with Neural Networks

This paper presents a general end-to-end approach to sequence learning that makes minimal assumptions on the sequence structure, and finds that reversing the order of the words in all source sentences improved the LSTM's performance markedly, because doing so introduced many short term dependencies between the source and the target sentence which made the optimization problem easier.