Just Say No: Analyzing the Stance of Neural Dialogue Generation in Offensive Contexts

  title={Just Say No: Analyzing the Stance of Neural Dialogue Generation in Offensive Contexts},
  author={Ashutosh Baheti and Maarten Sap and Alan Ritter and Mark O. Riedl},
Dialogue models trained on human conversations inadvertently learn to generate toxic responses. In addition to producing explicitly offensive utterances, these models can also implicitly insult a group or individual by aligning themselves with an offensive statement. To better understand the dynamics of contextually offensive language, we investigate the stance of dialogue model responses in offensive Reddit conversations. Specifically, we create ToxiChat, a crowd-annotated dataset of 2,000… 

Towards Identifying Social Bias in Dialog Systems: Frame, Datasets, and Benchmarks

This paper proposes a novel DIALBIAS FRAME for analyzing the social bias in conversations pragmatically, which considers more comprehensive bias-related analyses rather than simple dichotomy annotations, and introduces CDAIL-BIAS DATASET that is the first well-annotated Chinese social bias dialog dataset.

On the Safety of Conversational Models: Taxonomy, Dataset, and Benchmark

A dialogue safety classifier is trained to provide a strong baseline for context-sensitive dialogue unsafety detection, and a taxonomy for dialogue safety specifically designed to capture unsafe behaviors in human-bot dialogue settings is proposed.

Classifying and Automatically Neutralizing Hate Speech with Deep Learning Ensembles and Dataset Ensembles

This model is an ensemble system that utilizes a BERT encoder to identify hate speech words and phrases and contributes a two-fold pipeline that can detect hate speech given the training samples on a word-by-word basis using a classification model, then replace hateful words with more neutral words using a per-word seq2seq model to generate the neutral word.

On Controlling Fallback Responses for Grounded Dialogue Generation

A novel framework that automatically generates a control token with the generator to bias the succeeding response towards informativeness for answerable contexts and fallback for unanswerable contexts in an end-to-end manner is proposed.

Target-Guided Dialogue Response Generation Using Commonsense and Data Augmentation

A new technique is introduced for target-guided response generation, which first finds a bridging path of commonsense knowledge concepts between the source and the target, and then uses the identified bridges path to generate transition responses.

Improving Zero and Few-shot Generalization in Dialogue through Instruction Tuning

This work introduces I NSTRUCT D IAL, an instruction tuning framework for dialogue, which consists of a repository of 48 diverse dialogue tasks in a unified text-to-text format created from 59 openly available dialogue datasets, and reveals that it enables good zero-shot performance on unseen datasets and tasks such as dialogue evaluation and intent detection, and even better performance in a few-shot setting.

Revealing Persona Biases in Dialogue Systems

It is observed that adopting personas can actually decrease harmful responses, compared to not using any personas, and it is found that persona choices can affect the degree of harms in generated responses and thus should be systematically evaluated before deployment.

PANGUBOT: Efficient Generative Dialogue Pre-training from Pre-trained Language Model

P AN G U -B OT’s response quality, knowledge correctness, and safety are still far from perfect, and further explorations are indispensable to building reliable and smart dialogue systems.

Improving Multi-label Malevolence Detection in Dialogues through Multi-faceted Label Correlation Enhancement

This work proposes a multi-label dialogue malevolence detection model, multi-faceted label correlation enhanced CRF (MCRF), with two label correlation mechanisms, label correlation in taxonomy (LCT) andlabel correlation in context (LCC) and outperforms the best performing baseline by a large margin.

Robust Conversational Agents against Imperceptible Toxicity Triggers

This work proposes attacks against conversational agents that are imperceptible, i.e., they fit the conversation in terms of coherency, relevancy, and fluency, while they are effective and scalable, and establishes the generalizability of such a defense mechanism on language generation models beyond Conversational agents.



Build it Break it Fix it for Dialogue Safety: Robustness from Adversarial Human Attack

It is shown that offensive language used within a conversation critically depends on the dialogue context, and cannot be viewed as a single sentence offensive detection task as in most previous work.

Mitigating Gender Bias for Neural Dialogue Generation with Adversarial Learning

This paper proposes a novel adversarial learning framework Debiased-Chat to train dialogue models free from gender bias while keeping their performance, and shows that this framework significantly reduces gender bias in dialogue models while maintaining the response quality.

Polite Dialogue Generation Without Parallel Data

Human evaluation validates that while the Fusion and the retrieval-based models achieve politeness with poorer context-relevance, the LFT and Polite-RL models can produce significantly more polite responses without sacrificing dialogue quality.

Queens Are Powerful Too: Mitigating Gender Bias in Dialogue Generation

This work measures gender bias in dialogue data, and examines how this bias is actually amplified in subsequent generative chit-chat dialogue models, and considers three techniques to mitigate gender bias: counterfactual data augmentation, targeted data collection, and bias controlled training.

Wizard of Wikipedia: Knowledge-Powered Conversational agents

The best performing dialogue models are able to conduct knowledgeable discussions on open-domain topics as evaluated by automatic metrics and human evaluations, while a new benchmark allows for measuring further improvements in this important research direction.

Towards Empathetic Open-domain Conversation Models: A New Benchmark and Dataset

This work proposes a new benchmark for empathetic dialogue generation and EmpatheticDialogues, a novel dataset of 25k conversations grounded in emotional situations, and presents empirical comparisons of dialogue model adaptations forEmpathetic responding, leveraging existing models or datasets without requiring lengthy re-training of the full model.

DIALOGPT : Large-Scale Generative Pre-training for Conversational Response Generation

It is shown that conversational systems that leverage DialoGPT generate more relevant, contentful and context-consistent responses than strong baseline systems.

Interactional Stancetaking in Online Forums

This article begins with annotations of three linked stance dimensions—affect, investment, and alignment—on 68 conversation threads from the online platform Reddit, and investigates thread structure and linguistic properties of stancetaking in online conversations.

Recipes for Safety in Open-domain Chatbots

A new human-and-model-in-the-loop framework for both training safer models and for evaluating them, as well as a novel method to distill safety considerations inside generative models without the use of an external classifier at deployment time are introduced.

RedditBias: A Real-World Resource for Bias Evaluation and Debiasing of Conversational Language Models

RedDITBIAS is presented, the first conversational data set grounded in the actual human conversations from Reddit, allowing for bias measurement and mitigation across four important bias dimensions: gender, race, religion, and queerness, and an evaluation framework is developed.