Generating Label Cohesive and Well-Formed Adversarial Claims

@inproceedings{Atanasova2020GeneratingLC,
  title={Generating Label Cohesive and Well-Formed Adversarial Claims},
  author={Pepa Atanasova and Dustin Wright and Isabelle Augenstein},
  booktitle={EMNLP},
  year={2020}
}
Adversarial attacks reveal important vulnerabilities and flaws of trained models. One potent type of attack are universal adversarial triggers, which are individual n-grams that, when appended to instances of a class under attack, can trick a model into predicting a target class. However, for inference tasks such as fact checking, these triggers often inadvertently invert the meaning of instances they are inserted in. In addition, such attacks produce semantically nonsensical inputs, as they… Expand

Figures and Tables from this paper

Zero-shot Fact Verification by Claim Generation
TLDR
QACG, a framework for training a robust fact verification model by using automaticallygenerated claims that can be supported, refuted, or unverifiable from evidence from Wikipedia, is developed. Expand
Stance Detection Benchmark: How Robust Is Your Stance Detection?
TLDR
A StD benchmark that learns from ten StD datasets of various domains in a multi-dataset learning (MDL) setting, as well as from related tasks via transfer learning, which suggests the existence of biases inherited from multiple datasets by design. Expand
How Robust are Fact Checking Systems on Colloquial Claims?
TLDR
It is found that existing fact checking systems that perform well on claims in formal style significantly degenerate on colloquial claims with the same semantics, and it is shown that document retrieval is the weakest spot in the system even vulnerable to filler words, such as “yeah” and “you know”. Expand
Text Generation with Efficient (Soft) Q-Learning
TLDR
This paper introduces a new RL formulation for text generation from the soft Q-learning perspective that enables it to draw from the latest RL advances, such as path consistency learning, to combine the best of on-/off-policy updates, and learn effectively from sparse reward. Expand
A Review on Fact Extraction and Verification
We study the fact checking problem, which aims to identify the veracity of a given claim. Specifically, we focus on the task of Fact Extraction and VERification (FEVER) and its accompanied dataset.Expand
Explainable Automated Fact-Checking: A Survey
TLDR
This survey focuses on the explanation functionality – that is fact-checking systems providing reasons for their predictions, and summarizes existing methods for explaining the predictions of fact- checking systems and explores trends in this topic. Expand
A Diagnostic Study of Explainability Techniques for Text Classification
TLDR
A comprehensive list of diagnostic properties for evaluating existing explainability techniques is developed and it is found that the gradient-based explanations perform best across tasks and model architectures, and further insights into the properties are presented. Expand
A Survey on Stance Detection for Mis- and Disinformation Identification
TLDR
This survey examines the relationship between stance detection and misand disinformation detection from a holistic viewpoint and reviews and analyzes existing work in this area. Expand
Universal Adversarial Attacks with Natural Triggers for Text Classification
TLDR
This work develops adversarial attacks that appear closer to natural English phrases and yet confuse classification systems when added to benign inputs, and leverages an adversarially regularized autoencoder to generate triggers and proposes a gradient-based search that aims to maximize the downstream classifier’s prediction loss. Expand
Claim Check-Worthiness Detection as Positive Unlabelled Learning
TLDR
The best performing method is a unified approach which automatically corrects for this using a variant of positive unlabelled learning that finds instances which were incorrectly labelled as not check-worthy. Expand

References

SHOWING 1-10 OF 35 REFERENCES
Semantically Equivalent Adversarial Rules for Debugging NLP models
TLDR
This work presents semantically equivalent adversaries (SEAs) – semantic-preserving perturbations that induce changes in the model’s predictions that induce adversaries on many instances that are extremely similar semantically. Expand
Evaluating adversarial attacks against multiple fact verification systems
TLDR
This work evaluates adversarial instances generated by a recently proposed state-of-the-art method, a paraphrasing method, and rule-based attacks devised for fact verification and finds that the rule- based attacks have higher potency and that while the rankings among the top systems changed, they exhibited higher resilience than the baselines. Expand
DeSePtion: Dual Sequence Prediction and Adversarial Examples for Improved Fact-Checking
TLDR
This work shows that current systems for FEVER are vulnerable to three categories of realistic challenges for fact-checking – multiple propositions, temporal reasoning, and ambiguity and lexical variation – and introduces a resource with these types of claims, and presents a system designed to be resilient to these “attacks”. Expand
On Evaluation of Adversarial Perturbations for Sequence-to-Sequence Models
TLDR
A new evaluation framework for adversarial attacks on seq2seq models that takes the semantic equivalence of the pre- and post-perturbation input into account is proposed and it is shown that performing untargeted adversarial training with meaning-preserving attacks is beneficial to the model in terms of adversarial robustness, without hurting test performance. Expand
Generating Natural Language Adversarial Examples through Probability Weighted Word Saliency
TLDR
A new word replacement order determined by both the wordsaliency and the classification probability is introduced, and a greedy algorithm called probability weighted word saliency (PWWS) is proposed for text adversarial attack. Expand
Semantics Preserving Adversarial Learning.
TLDR
This paper proposes an efficient algorithm whereby the semantics of the inputs are leverage as a source of knowledge upon which to impose adversarial constraints, and shows its effectiveness in producing semantics preserving adversarial examples which evade existing defenses against adversarial attacks. Expand
The FEVER2.0 Shared Task
TLDR
There was a great variety in adversarial attack types as well as the techniques used to generate the attacks, highlighting commonalities and innovations among participating systems. Expand
Explaining and Harnessing Adversarial Examples
TLDR
It is argued that the primary cause of neural networks' vulnerability to adversarial perturbation is their linear nature, supported by new quantitative results while giving the first explanation of the most intriguing fact about them: their generalization across architectures and training sets. Expand
Adversarial Attacks on Deep-learning Models in Natural Language Processing
TLDR
A systematic survey on preliminary knowledge of NLP and related seminal works in computer vision is presented, which collects all related academic works since the first appearance in 2017 and analyzes 40 representative works in a comprehensive way. Expand
Universal Adversarial Perturbation for Text Classification
TLDR
This work proposes an algorithm to compute universal adversarial perturbations, and shows that the state-of-the-art deep neural networks are highly vulnerable to them, even though they keep the neighborhood of tokens mostly preserved. Expand
...
1
2
3
4
...