Learning from data in the mixed adversarial non-adversarial case: Finding the helpers and ignoring the trolls
@article{Ju2022LearningFD, title={Learning from data in the mixed adversarial non-adversarial case: Finding the helpers and ignoring the trolls}, author={Da Ju and Jing Xu and Y-Lan Boureau and Jason Weston}, journal={ArXiv}, year={2022}, volume={abs/2208.03295} }
The promise of interaction between intelligent conversational agents and humans is that models can learn from such feedback in order to improve. Unfortunately, such exchanges in the wild will not always involve human utterances that are benign or of high quality, and will in-clude a mixture of engaged (helpers) and unengaged or even malicious users (trolls). In this work we study how to perform robust learning in such an environment. We introduce a benchmark evaluation, SafetyMix, which can…
6 Citations
Learning New Skills after Deployment: Improving open-domain internet-driven dialogue with human feedback
- Computer ScienceArXiv
- 2022
This work collects deployment data of human interactions, collects various types of human feedback, and studies various algorithms for improving from such feedback in order to make recommendations on which type of feedback and algorithms work best.
The CRINGE Loss: Learning what language not to model
- Computer ScienceArXiv
- 2022
This work proposes a novel procedure to train with negative data called the C RINGE loss (ContRastive Iterative Negative GEneration), and shows the effectiveness of this approach across three different experiments on the tasks of safe generation, contradiction avoidance, and open-domain dialogue.
Revision Transformers: Getting RiT of No-Nos
- Computer ScienceArXiv
- 2022
This work proposes the Revision Transformer (RiT), a combination of a large-scale pre-trained LM that inherently but also diffusely encodes world knowledge with a clear-structured revision engine that makes it possible to update the model's knowledge with little effort and the help of user interaction.
BlenderBot 3: a deployed conversational agent that continually learns to responsibly engage
- Computer ScienceArXiv
- 2022
The goal of this research program is to enable the community to study ever-improving responsible agents that learn through interaction in BlenderBot 3, a 175B parameter dialogue model capable of open-domain conversation with access to the internet and a long-term memory.
When Life Gives You Lemons, Make Cherryade: Converting Feedback from Bad Responses into Good Labels
- Computer ScienceArXiv
- 2022
J UICER, a framework to make use of both binary and free-form textual human feedback, works by extending sparse binary feedback by training a satisfaction class to label the unlabeled data and training a reply corrector to map the bad replies to good ones.
Towards Boosting the Open-Domain Chatbot with Human Feedback
- Computer ScienceArXiv
- 2022
A novel andcient approach Diamante to boost the open-domain chatbot, where two kinds of human feedback are collected and leveraged and the implicit preference in the data collection process and the generation-evaluation joint training is introduced.
References
SHOWING 1-10 OF 46 REFERENCES
Build it Break it Fix it for Dialogue Safety: Robustness from Adversarial Human Attack
- Computer ScienceEMNLP
- 2019
It is shown that offensive language used within a conversation critically depends on the dialogue context, and cannot be viewed as a single sentence offensive detection task as in most previous work.
Explaining and Harnessing Adversarial Examples
- Computer ScienceICLR
- 2015
It is argued that the primary cause of neural networks' vulnerability to adversarial perturbation is their linear nature, supported by new quantitative results while giving the first explanation of the most intriguing fact about them: their generalization across architectures and training sets.
Learning with Bad Training Data via Iterative Trimmed Loss Minimization
- Computer ScienceICML
- 2019
This paper proposes to iteratively minimize the trimmed loss, by alternating between selecting samples with lowest current loss, and retraining a model on only these samples, and proves that this process recovers the ground truth in generalized linear models with standard statistical assumptions.
Deploying Lifelong Open-Domain Dialogue Learning
- Computer ScienceArXiv
- 2020
This work builds and deploy a role-playing game, whereby human players converse with learning agents situated in an open-domain fantasy world and shows that by training models on the conversations they have with humans in the game the models progressively improve, as measured by automatic metrics and online engagement scores.
Recipes for Safety in Open-domain Chatbots
- Computer ScienceArXiv
- 2020
A new human-and-model-in-the-loop framework for both training safer models and for evaluating them, as well as a novel method to distill safety considerations inside generative models without the use of an external classifier at deployment time are introduced.
Ex Machina: Personal Attacks Seen at Scale
- Computer ScienceWWW
- 2017
A method that combines crowdsourcing and machine learning to analyze personal attacks at scale is developed and illustrated, and an evaluation method for a classifier in terms of the aggregated number of crowd-workers it can approximate is shown.
Peer Loss Functions: Learning from Noisy Labels without Knowing Noise Rates
- Computer ScienceICML
- 2020
Performing ERM with peer loss functions on the noisy dataset leads to the optimal or a near-optimal classifier as if performing ERM over the clean training data, which the authors do not have access to.
Anticipating Safety Issues in E2E Conversational AI: Framework and Tooling
- Computer ScienceArXiv
- 2021
This paper surveys the problem landscape for safety for end-to-end conversational AI, highlights tensions between values, potential positive impact and potential harms, and provides a framework for making decisions about whether and how to release these models, following the tenets of value-sensitive design.
Learning from Noisy Labels with Deep Neural Networks: A Survey
- Computer ScienceIEEE transactions on neural networks and learning systems
- 2022
A comprehensive review of 62 state-of-the-art robust training methods, all of which are categorized into five groups according to their methodological difference, followed by a systematic comparison of six properties used to evaluate their superiority.
Queens Are Powerful Too: Mitigating Gender Bias in Dialogue Generation
- EconomicsEMNLP
- 2020
This work measures gender bias in dialogue data, and examines how this bias is actually amplified in subsequent generative chit-chat dialogue models, and considers three techniques to mitigate gender bias: counterfactual data augmentation, targeted data collection, and bias controlled training.