Analyzing Dynamic Adversarial Training Data in the Limit
@inproceedings{Wallace2021AnalyzingDA, title={Analyzing Dynamic Adversarial Training Data in the Limit}, author={Eric Wallace and Adina Williams and Robin Jia and Douwe Kiela}, booktitle={Findings}, year={2021} }
To create models that are robust across a wide range of test inputs, training datasets should include diverse examples that span numerous phenomena. Dynamic adversarial data collection (DADC), where annotators craft examples that challenge continually improving models, holds promise as an approach for generating such diverse training sets. Prior work has shown that running DADC over 1-3 rounds can help models fix some error types, but it does not necessarily lead to better generalization beyond…
Figures and Tables from this paper
9 Citations
Adversarially Constructed Evaluation Sets Are More Challenging, but May Not Be Fair
- Computer ScienceDADC
- 2022
This work studies the impact of applying three common approaches for adversarial dataset creation: filtering out easy examples, perturbing examples, and model-in-the-loop data collection (ANLI and AdversarialQA), across 18 different adversary models.
Models in the Loop: Aiding Crowdworkers with Generative Annotation Assistants
- Computer ScienceNAACL
- 2022
Generative Annotation Assistants (GAAs) are introduced, generator-in-the-loop models that provide real-time suggestions that annotators can either approve, modify, or reject entirely and are found to lead to higher downstream model performance on a variety of question answering tasks over adversarial data collection.
Overconfidence in the Face of Ambiguity with Adversarial Data
- Computer ScienceDADC
- 2022
This work investigates whether models trained on adversarially-collected data are miscalibrated with respect to the ambiguity of their inputs, and finds no clear difference in accuracy between naturalistically and adversARially trained models.
Adversarial Training for High-Stakes Reliability
- Computer ScienceArXiv
- 2022
This work created a series of adversarial training techniques—including a tool that assists human adversaries—to create and eliminate failures in a classifier that filters text completions suggested by a generator, and found that adversarialTraining increased robustness to the adversarial attacks that it trained on, without affecting in-distribution performance.
ANLIzing the Adversarial Natural Language Inference Dataset
- Computer ScienceSCIL
- 2022
We perform an in-depth error analysis of Adversarial NLI (ANLI), a recently introduced large-scale human-and-model-in-the-loop natural language inference dataset collected over multiple rounds. We…
Benchmarking Long-tail Generalization with Likelihood Splits
- Computer ScienceArXiv
- 2022
This work proposes a method to create challenging benchmarks that require generalizing to the tail of the distribution by re-splitting existing datasets by creating ‘Likeli-hood splits’ where examples that are assigned lower likelihood by a pre-trained language model are placed in the test set, and more likely examples are in the training set.
Red Teaming Language Models with Language Models
- Computer ScienceArXiv
- 2022
This work automatically finds cases where a target LM behaves in a harmful way, by generating test cases (“red teaming”) using another LM, and evaluates the target LM’s replies to generated test questions using a classifier trained to detect offensive content.
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned
- Computer ScienceArXiv
- 2022
It is found that the RLHF models are increasinglycult to red team as they scale, and a trend with scale for the other model types is found, which indicates that this transparency accelerates the ability to work together as a community in order to develop shared norms, practices, and technical standards.
Dynatask: A Framework for Creating Dynamic AI Benchmark Tasks
- Computer ScienceACL
- 2022
We introduce Dynatask: an open source system for setting up custom NLP tasks that aims to greatly lower the technical knowledge and effort required for hosting and evaluating state-of-the-art NLP…
References
SHOWING 1-10 OF 69 REFERENCES
On the Efficacy of Adversarial Data Collection for Question Answering: Results from a Large-Scale Randomized Study
- Computer ScienceACL
- 2021
Across a variety of models and datasets, it is found that models trained on adversarial data usually perform better on other adversarial datasets but worse on a diverse collection of out-of-domain evaluation sets.
Explaining and Harnessing Adversarial Examples
- Computer ScienceICLR
- 2015
It is argued that the primary cause of neural networks' vulnerability to adversarial perturbation is their linear nature, supported by new quantitative results while giving the first explanation of the most intriguing fact about them: their generalization across architectures and training sets.
Beat the AI: Investigating Adversarial Human Annotation for Reading Comprehension
- Computer ScienceTransactions of the Association for Computational Linguistics
- 2020
This work investigates this annotation methodology and applies it in three different settings, collecting a total of 36,000 samples with progressively stronger models in the annotation loop, finding that stronger models can still learn from datasets collected with substantially weaker models-in-the-loop.
Adversarial Filters of Dataset Biases
- Computer ScienceICML
- 2020
This work presents extensive supporting evidence that AFLite is broadly applicable for reduction of measurable dataset biases, and that models trained on the filtered datasets yield better generalization to out-of-distribution tasks.
Adversarial NLI: A New Benchmark for Natural Language Understanding
- Computer ScienceACL
- 2020
This work introduces a new large-scale NLI benchmark dataset, collected via an iterative, adversarial human-and-model-in-the-loop procedure, and shows that non-expert annotators are successful at finding their weaknesses.
SWAG: A Large-Scale Adversarial Dataset for Grounded Commonsense Inference
- Computer ScienceEMNLP
- 2018
This paper introduces the task of grounded commonsense inference, unifying natural language inference and commonsense reasoning, and proposes Adversarial Filtering (AF), a novel procedure that constructs a de-biased dataset by iteratively training an ensemble of stylistic classifiers, and using them to filter the data.
What Will it Take to Fix Benchmarking in Natural Language Understanding?
- Computer ScienceNAACL
- 2021
It is argued most current benchmarks fail at these criteria, and that adversarially-constructed, out-of-distribution test sets does not meaningfully address the causes of these failures.
Counterfactually-Augmented SNLI Training Data Does Not Yield Better Generalization Than Unaugmented Data
- Computer ScienceINSIGHTS
- 2020
It is found that models trained on a counterfactually-augmented SNLI dataset do not generalize better than unaugmenting datasets of similar size and that counterfactual augmentation can hurt performance, yielding models that are less robust to challenge examples.
Semantically Equivalent Adversarial Rules for Debugging NLP models
- Computer ScienceACL
- 2018
This work presents semantically equivalent adversaries (SEAs) – semantic-preserving perturbations that induce changes in the model’s predictions that induce adversaries on many instances that are extremely similar semantically.
New Protocols and Negative Results for Textual Entailment Data Collection
- Computer ScienceEMNLP
- 2020
Four alternative protocols are proposed, each aimed at improving either the ease with which annotators can produce sound training examples or the quality and diversity of those examples, and it is observed that all four new protocols reduce previously observed issues with annotation artifacts.