Social Chemistry 101: Learning to Reason about Social and Moral Norms

  title={Social Chemistry 101: Learning to Reason about Social and Moral Norms},
  author={Maxwell Forbes and Jena D. Hwang and Vered Shwartz and Maarten Sap and Yejin Choi},
Social norms---the unspoken commonsense rules about acceptable social behavior---are crucial in understanding the underlying causes and intents of people's actions in narratives. For example, underlying an action such as "wanting to call cops on my neighbors" are social norms that inform our conduct, such as "It is expected that you report crimes." We present Social Chemistry, a new conceptual formalism to study people's everyday social norms and moral judgments over a rich spectrum of real… 

Moral Mimicry: Large Language Models Produce Moral Rationalizations Tailored to Political Identity

Large Language Models (LLMs) have recently demonstrated impressive capability in generating fluent text. LLMs have also shown an alarming tendency to reproduce social biases, for example stereotypical

On the Machine Learning of Ethical Judgments from Natural Language

Through an audit of recent work on computational approaches for predicting morality, this work examines the broader issues that arise from such efforts and offers a critique of such NLP methods for automating ethical decision-making.

Does Moral Code have a Moral Code? Probing Delphi’s Moral Philosophy

In an effort to guarantee that machine learning model outputs conform with human moral values, recent work has begun exploring the possibility of explicitly training models to learn the difference

AiSocrates: Towards Answering Ethical Quandary Questions

It is argued that A I S OCRATES is a promising step toward developing an NLP system that incorporates human values explicitly by prompt instructions, and addressed safety concerns by providing a human controllability option in choosing ethical principles.

The Moral Integrity Corpus: A Benchmark for Ethical Dialogue Systems

The Moral Integrity Corpus, MIC, is a resource, which captures the moral assumptions of 38k prompt-reply pairs, using 99k distinct Rules of Thumb (RoTs), and is suggested that MIC will be a useful resource for understanding and language models’ implicit moral assumptions and flexibly benchmarking the integrity of conversational agents.

DREAM: Improving Situational QA by First Elaborating the Situation

Adding focused elaborations about a situation can improve a system’s reasoning about it, and may serve as an effective way of injecting new scenario-based knowledge into QA models.

DREAM: Uncovering Mental Models behind Language Models

DREAM is proposed, a model that takes a situational question as input to produce a mental model elaborating the situation, without any additional task specific training data for mental models, and inherits its social commonsense through distant supervision from existing NLP resources.

A Word on Machine Ethics: A Response to Jiang et al. (2021)

This work focuses on a single case study of the recently proposed Delphi model and offers a critique of the project’s proposed method of automating morality judgments, and concludes with a discussion of how machine ethics could usefully proceed, by focusing on current and near-future uses of technology, in a way that centers around transparency, democratic values, and allows for straightforward accountability.

Towards Socially Intelligent Agents with Mental State Transition and Human Utility

A hybrid mental state parser that extracts information from both the dialogue and event observations and maintains a graphical representation of the agent’s mind and a transformer-based value model that learns human preferences from the human value dataset, V ALUE N ET.


The first major attempt to computationally explore the vast space of moral implications in real-world settings is conducted, with Delphi, a unified model of descriptive ethics empowered by diverse data of people’s moral judgment from COMMONSENSE NORM BANK.



Liberals and conservatives rely on different sets of moral foundations.

Across 4 studies using multiple methods, liberals consistently showed greater endorsement and use of the Harm/care and Fairness/reciprocity foundations compared to the other 3 foundations, whereas conservatives endorsed and used the 5 foundations more equally.

A Theory of Blame

One of the most intriguing aspects of the assignment of blame is the contrast between the apparent simplicity of the everyday assertion or avoidance of blame and the complexity of the theoretical

A Corpus and Cloze Evaluation for Deeper Understanding of Commonsense Stories

A new framework for evaluating story understanding and script learning: the `Story Cloze Test’, which requires a system to choose the correct ending to a four-sentence story, and a new corpus of 50k five- Sentence commonsense stories, ROCStories, to enable this evaluation.

Bleu: a Method for Automatic Evaluation of Machine Translation

This work proposes a method of automatic machine translation evaluation that is quick, inexpensive, and language-independent, that correlates highly with human evaluation, and that has little marginal cost per run.

The Curious Case of Neural Text Degeneration

By sampling text from the dynamic nucleus of the probability distribution, which allows for diversity while effectively truncating the less reliable tail of the distribution, the resulting text better demonstrates the quality of human text, yielding enhanced diversity without sacrificing fluency and coherence.

An Atlas of Cultural Commonsense for Machine Reasoning

This work introduces an approach that extends prior work on crowdsourcing commonsense knowledge by incorporating differences in knowledge that are attributable to cultural or national groups, and moves a step closer towards building a machine that doesn't assume a rigid framework of universal Commonsense knowledge, but rather has the ability to reason in a contextually and culturally sensitive way.

Scruples: A Corpus of Community Ethical Judgments on 32, 000 Real-Life Anecdotes

This work introduces SCRUPLES, the first large-scale dataset with 625,000 ethical judgments over 32,000 real-life anecdotes, and presents a new method to estimate the best possible performance on such tasks with inherently diverse label distributions, and explores likelihood functions that separate intrinsic from model uncertainty.

Would you Rather? A New Benchmark for Learning Machine Alignment with Cultural Values and Social Preferences

A newly introduced problem concerned with predicting the preferable options from two sentences describing scenarios that may involve social and cultural situations, framed as a natural language inference task with crowd-sourced preference votes by human players, obtained from a gamified voting platform.

Social Bias Frames: Reasoning about Social and Power Implications of Language

It is found that while state-of-the-art neural models are effective at high-level categorization of whether a given statement projects unwanted social bias, they are not effective at spelling out more detailed explanations in terms of Social Bias Frames.

BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension

BART is presented, a denoising autoencoder for pretraining sequence-to-sequence models, which matches the performance of RoBERTa on GLUE and SQuAD, and achieves new state-of-the-art results on a range of abstractive dialogue, question answering, and summarization tasks.