Learning from data in the mixed adversarial non-adversarial case: Finding the helpers and ignoring the trolls

  title={Learning from data in the mixed adversarial non-adversarial case: Finding the helpers and ignoring the trolls},
  author={Da Ju and Jing Xu and Y-Lan Boureau and Jason Weston},
The promise of interaction between intelligent conversational agents and humans is that models can learn from such feedback in order to improve. Unfortunately, such exchanges in the wild will not always involve human utterances that are benign or of high quality, and will in-clude a mixture of engaged (helpers) and unengaged or even malicious users (trolls). In this work we study how to perform robust learning in such an environment. We introduce a benchmark evaluation, SafetyMix, which can… 

Figures and Tables from this paper

Learning New Skills after Deployment: Improving open-domain internet-driven dialogue with human feedback

This work collects deployment data of human interactions, collects various types of human feedback, and studies various algorithms for improving from such feedback in order to make recommendations on which type of feedback and algorithms work best.

The CRINGE Loss: Learning what language not to model

This work proposes a novel procedure to train with negative data called the C RINGE loss (ContRastive Iterative Negative GEneration), and shows the effectiveness of this approach across three different experiments on the tasks of safe generation, contradiction avoidance, and open-domain dialogue.

Revision Transformers: Getting RiT of No-Nos

This work proposes the Revision Transformer (RiT), a combination of a large-scale pre-trained LM that inherently but also diffusely encodes world knowledge with a clear-structured revision engine that makes it possible to update the model's knowledge with little effort and the help of user interaction.

BlenderBot 3: a deployed conversational agent that continually learns to responsibly engage

The goal of this research program is to enable the community to study ever-improving responsible agents that learn through interaction in BlenderBot 3, a 175B parameter dialogue model capable of open-domain conversation with access to the internet and a long-term memory.

When Life Gives You Lemons, Make Cherryade: Converting Feedback from Bad Responses into Good Labels

J UICER, a framework to make use of both binary and free-form textual human feedback, works by extending sparse binary feedback by training a satisfaction class to label the unlabeled data and training a reply corrector to map the bad replies to good ones.

Towards Boosting the Open-Domain Chatbot with Human Feedback

A novel andcient approach Diamante to boost the open-domain chatbot, where two kinds of human feedback are collected and leveraged and the implicit preference in the data collection process and the generation-evaluation joint training is introduced.



Build it Break it Fix it for Dialogue Safety: Robustness from Adversarial Human Attack

It is shown that offensive language used within a conversation critically depends on the dialogue context, and cannot be viewed as a single sentence offensive detection task as in most previous work.

Explaining and Harnessing Adversarial Examples

It is argued that the primary cause of neural networks' vulnerability to adversarial perturbation is their linear nature, supported by new quantitative results while giving the first explanation of the most intriguing fact about them: their generalization across architectures and training sets.

Learning with Bad Training Data via Iterative Trimmed Loss Minimization

This paper proposes to iteratively minimize the trimmed loss, by alternating between selecting samples with lowest current loss, and retraining a model on only these samples, and proves that this process recovers the ground truth in generalized linear models with standard statistical assumptions.

Deploying Lifelong Open-Domain Dialogue Learning

This work builds and deploy a role-playing game, whereby human players converse with learning agents situated in an open-domain fantasy world and shows that by training models on the conversations they have with humans in the game the models progressively improve, as measured by automatic metrics and online engagement scores.

Recipes for Safety in Open-domain Chatbots

A new human-and-model-in-the-loop framework for both training safer models and for evaluating them, as well as a novel method to distill safety considerations inside generative models without the use of an external classifier at deployment time are introduced.

Ex Machina: Personal Attacks Seen at Scale

A method that combines crowdsourcing and machine learning to analyze personal attacks at scale is developed and illustrated, and an evaluation method for a classifier in terms of the aggregated number of crowd-workers it can approximate is shown.

Peer Loss Functions: Learning from Noisy Labels without Knowing Noise Rates

Performing ERM with peer loss functions on the noisy dataset leads to the optimal or a near-optimal classifier as if performing ERM over the clean training data, which the authors do not have access to.

Anticipating Safety Issues in E2E Conversational AI: Framework and Tooling

This paper surveys the problem landscape for safety for end-to-end conversational AI, highlights tensions between values, potential positive impact and potential harms, and provides a framework for making decisions about whether and how to release these models, following the tenets of value-sensitive design.

Learning from Noisy Labels with Deep Neural Networks: A Survey

A comprehensive review of 62 state-of-the-art robust training methods, all of which are categorized into five groups according to their methodological difference, followed by a systematic comparison of six properties used to evaluate their superiority.

Queens Are Powerful Too: Mitigating Gender Bias in Dialogue Generation

This work measures gender bias in dialogue data, and examines how this bias is actually amplified in subsequent generative chit-chat dialogue models, and considers three techniques to mitigate gender bias: counterfactual data augmentation, targeted data collection, and bias controlled training.