Lawyers are Dishonest? Quantifying Representational Harms in Commonsense Knowledge Resources

@inproceedings{Mehrabi2021LawyersAD,
  title={Lawyers are Dishonest? Quantifying Representational Harms in Commonsense Knowledge Resources},
  author={Ninareh Mehrabi and Pei Zhou and Fred Morstatter and Jay Pujara and Xiang Ren and A. G. Galstyan},
  booktitle={EMNLP},
  year={2021}
}
Warning: this paper contains content that may be offensive or upsetting. Commonsense knowledge bases (CSKB) are increasingly used for various natural language processing tasks. Since CSKBs are mostly human-generated and may reflect societal biases, it is important to ensure that such biases are not conflated with the notion of commonsense. Here we focus on two widely used CSKBs, ConceptNet and GenericsKB, and establish the presence of bias in the form of two types of representational harms… 
RICA: Evaluating Robust Inference Capabilities Based on Commonsense Axioms
TLDR
A new challenge, RICA: Robust Inference using Commonsense Axioms, that evaluates robust commonsense inference despite textual perturbations and shows that PTLMs perform no better than random guessing on the zero-shot setting, are heavily impacted by statistical biases, and are not robust to perturbation attacks.
What do Bias Measures Measure?
TLDR
This work presents a comprehensive survey of existing bias measures in NLP as a function of the associated NLP tasks, metrics, datasets, and social biases and corresponding harms and proposes a documentation standard for bias measures to aid their development, categorization, and appropriate usage.
Explaining Toxic Text via Knowledge Enhanced Text Generation
TLDR
A novel knowledge-informed encoder-decoder framework is introduced to utilize multiple knowledge sources to generate impli-cations of biased text and can generate detailed explanations of stereotypes in toxic speech compared to baselines, both quantitatively and qualitatively.
PaCo: Preconditions Attributed to Commonsense Knowledge
TLDR
A novel challenge of reasoning with circumstantial preconditions of commonsense statements expressed in natural language and results re-veal a 10 - 30% gap between machine and human performance on the authors' tasks, which shows that reasoning with precondition is an open challenge.
Perturbation Augmentation for Fairer NLP
TLDR
It is shown that language models pre-trained on demographically perturbed corpora are more fair, at least, according to the current best metrics for measuring model fairness, and that improved fairness does not come at the expense of accuracy.
Probing Commonsense Explanation in Dialogue Response Generation
TLDR
This study formalizes the problem by framing commonsense as a latent variable in the RG task and using explanations for responses as textual form of commonsense, and collecting 6k annotated explanations justifying responses from four dialogue datasets and asking humans to verify them.
Commonsense-Focused Dialogues for Response Generation: An Empirical Study
TLDR
This paper auto-extract commonsensical dialogues from existing dialogue datasets by leveraging ConceptNet, a commonsense knowledge graph, and proposes an approach for automatic evaluation of commonsense that relies on features derived from ConceptNet and pre-trained language and dialog models, and shows reasonable correlation with human evaluation of responses’ commonsense quality.
Think Before You Speak: Explicitly Generating Implicit Commonsense Knowledge for Response Generation
TLDR
Think-Before-Speaking is presented, a generative approach to first externalize implicit commonsense knowledge (think) and use this knowledge to generate responses (speak), arguing that externalizing implicit knowledge allows more efficient learning, produces more informative responses, and enables more explainable models.
CoreQuisite: Circumstantial Preconditions of Common Sense Knowledge
TLDR
A dataset is presented, called CoreQuisite, which annotates commonsense facts with preconditions expressed in natural language, and it is shown that there is a 10-30%gap between machine and human performance on these tasks.
GreaseLM: Graph REASoning Enhanced Language Models for Question Answering
TLDR
This work proposes GREASELM, a new model that fuses encoded representations from pretrained LMs and graph neural networks over multiple layers of modality interaction operations, allowing language context representations to be grounded by structured world knowledge, and allowing linguistic nuances in the context to inform the graph representations of knowledge.
...
...

References

SHOWING 1-10 OF 52 REFERENCES
StereoSet: Measuring stereotypical bias in pretrained language models
TLDR
StereoSet, a large-scale natural English dataset to measure stereotypical biases in four domains: gender, profession, race, and religion, is presented and it is shown that popular models like BERT, GPT-2, RoBERTa, and XLnet exhibit strong stereotypical biases.
UNQOVERing Stereotypical Biases via Underspecified Questions
TLDR
UNQOVER, a general framework to probe and quantify biases through underspecified questions, is presented, showing that a naive use of model scores can lead to incorrect bias estimates due to two forms of reasoning errors: positional dependence and question independence.
RICA: Evaluating Robust Inference Capabilities Based on Commonsense Axioms
TLDR
A new challenge, RICA: Robust Inference using Commonsense Axioms, that evaluates robust commonsense inference despite textual perturbations and shows that PTLMs perform no better than random guessing on the zero-shot setting, are heavily impacted by statistical biases, and are not robust to perturbation attacks.
Language (Technology) is Power: A Critical Survey of “Bias” in NLP
TLDR
A greater recognition of the relationships between language and social hierarchies is urged, encouraging researchers and practitioners to articulate their conceptualizations of “bias” and to center work around the lived experiences of members of communities affected by NLP systems.
ATOMIC: An Atlas of Machine Commonsense for If-Then Reasoning
TLDR
Experimental results demonstrate that multitask models that incorporate the hierarchical structure of if-then relation types lead to more accurate inference compared to models trained in isolation, as measured by both automatic and human evaluation.
Empath: Understanding Topic Signals in Large-Scale Text
TLDR
Empath is a tool that can generate and validate new lexical categories on demand from a small set of seed terms, which draws connotations between words and phrases by deep learning a neural embedding across more than 1.8 billion words of modern fiction.
Towards Controllable Biases in Language Generation
TLDR
The effectiveness of the approach at facilitating bias analysis is shown by finding topics that correspond to demographic inequalities in generated text and comparing the relative effectiveness of inducing biases for different demographics.
Connecting the Dots: A Knowledgeable Path Generator for Commonsense Question Answering
TLDR
This paper augments a general commonsense QA framework with a knowledgeable path generator by extrapolating over existing paths in a KG with a state-of-the-art language model, which learns to connect a pair of entities in text with a dynamic, and potentially novel, multi-hop relational path.
Mitigating Gender Bias in Natural Language Processing: Literature Review
TLDR
This paper discusses gender bias based on four forms of representation bias and analyzes methods recognizing gender bias in NLP, and discusses the advantages and drawbacks of existing gender debiasing methods.
Unsupervised Commonsense Question Answering with Self-Talk
TLDR
An unsupervised framework based on self-talk as a novel alternative to multiple-choice commonsense tasks, inspired by inquiry-based discovery learning, which improves performance on several benchmarks and competes with models that obtain knowledge from external KBs.
...
...