Make Up Your Mind! Adversarial Generation of Inconsistent Natural Language Explanations

@inproceedings{Camburu2020MakeUY,
  title={Make Up Your Mind! Adversarial Generation of Inconsistent Natural Language Explanations},
  author={Oana-Maria Camburu and Brendan Shillingford and Pasquale Minervini and Thomas Lukasiewicz and Phil Blunsom},
  booktitle={ACL},
  year={2020}
}
To increase trust in artificial intelligence systems, a promising research direction consists of designing neural models capable of generating natural language explanations for their predictions. In this work, we show that such models are nonetheless prone to generating mutually inconsistent explanations, such as ”Because there is a dog in the image.” and ”Because there is no dog in the [same] image.”, exposing flaws in either the decision-making process of the model or in the generation of the… 

Tables from this paper

Can Rationalization Improve Robustness?
TLDR
This paper systemat-ically generates various types of ‘AddText’ attacks for both token and sentence-level rationalization tasks and performs an extensive empirical evaluation of state-of-the-art rationale models across different tasks.
NILE : Natural Language Inference with Faithful Natural Language Explanations
TLDR
This work proposes Natural-language Inference over Label-specific Explanations (NILE), a novel NLI method which utilizes auto-generated label-specific NL explanations to produce labels along with its faithful explanation and demonstrates NILE’s effectiveness over previously reported methods through automated and human evaluation of the produced labels and explanations.
Generating Fluent Chinese Adversarial Examples for Sentiment Classification
TLDR
A new method to generate Chinese natural language adversarial examples, which is called AD-ER (Adversarial Examples with Readability), which has good readability and diversity, which are more fluent and harder to be detected.
LIREx: Augmenting Language Inference with Relevant Explanation
TLDR
Qualitative analysis shows that LIREx generates flexible, faithful, and relevant NLEs that allow the model to be more robust to spurious explanations, and achieves significantly better performance than previous studies when transferred to the out-of-domain MultiNLI data set.
Beyond Distributional Hypothesis: Let Language Models Learn Meaning-Text Correspondence
TLDR
A novel intermediate training task, named meaning-matching, is proposed, designed to directly learn a meaning-text correspondence, instead of relying on the distributional hypothesis, which enables PLMs to learn lexical semantic information.
e-ViL: A Dataset and Benchmark for Natural Language Explanations in Vision-Language Tasks
TLDR
e-ViL is a benchmark for explainable vision-language tasks that establishes a unified evaluation framework and provides the first comprehensive comparison of existing approaches that generate NLEs for VL tasks.
Self-Consistency Improves Chain of Thought Reasoning in Language Models
TLDR
A simple ensemble strategy, self-consistency, that robustly improves accuracy across a variety of language models and model scales without the need for additional training or auxiliary models is explored.
Rationale-Inspired Natural Language Explanations with Commonsense
TLDR
This paper introduces a selfrationalizing framework, called REXC, that extracts rationales as most responsible features for the predictions, expands the extractive rationales using commonsense resources, and selects the best-suited commonsense knowledge to generate NLEs and give the final prediction.
Do Natural Language Explanations Represent Valid Logical Arguments? Verifying Entailment in Explainable NLI Gold Standards
TLDR
A systematic annotation methodology, named Explanation Entailment Verification (EEV), is proposed, to quantify the logical validity of human-annotated explanations, and confirms that the inferential properties of explanations are still poorly formalised and understood.
Scientific Explanation and Natural Language: A Unified Epistemological-Linguistic Perspective for Explainable AI
A fundamental research goal for Explainable AI (XAI) is to build models that are capable of reasoning through the generation of natural language explanations . However, the methodologies to design
...
1
2
3
4
...

References

SHOWING 1-10 OF 38 REFERENCES
Generating Natural Adversarial Examples
TLDR
This paper proposes a framework to generate natural and legible adversarial examples that lie on the data manifold, by searching in semantic space of dense and continuous data representation, utilizing the recent advances in generative adversarial networks.
Adversarially Regularising Neural NLI Models to Integrate Logical Background Knowledge
TLDR
This paper reduces the problem of automatically generating adversarial examples that violate a set of given First-Order Logic constraints in Natural Language Inference by maximising a quantity measuring the degree of violation of such constraints and using a language model for generating linguistically-plausible examples.
Pathologies of Neural Models Make Interpretations Difficult
TLDR
This work uses input reduction, which iteratively removes the least important word from the input, to expose pathological behaviors of neural models: the remaining words appear nonsensical to humans and are not the ones determined as important by interpretation methods.
e-SNLI: Natural Language Inference with Natural Language Explanations
TLDR
The Stanford Natural Language Inference dataset is extended with an additional layer of human-annotated natural language explanations of the entailment relations, which can be used for various goals, such as obtaining full sentence justifications of a model’s decisions, improving universal sentence representations and transferring to out-of-domain NLI datasets.
Grounding Visual Explanations
TLDR
A phrase-critic model to refine generated candidate explanations augmented with flipped phrases to improve the textual explanation quality of fine-grained classification decisions on the CUB dataset by mentioning phrases that are grounded in the image.
Adversarial Attacks on Deep-learning Models in Natural Language Processing
TLDR
A systematic survey on preliminary knowledge of NLP and related seminal works in computer vision is presented, which collects all related academic works since the first appearance in 2017 and analyzes 40 representative works in a comprehensive way.
Adversarial Example Generation with Syntactically Controlled Paraphrase Networks
TLDR
A combination of automated and human evaluations show that SCPNs generate paraphrases that follow their target specifications without decreasing paraphrase quality when compared to baseline (uncontrolled) paraphrase systems.
Generating Textual Adversarial Examples for Deep Learning Models: A Survey
TLDR
This article reviews research works that address this difference and generate textual adversarial examples on DNNs and collects, select, summarize, discuss and analyze these works in a comprehensive way and cover all the related information to make the article self-contained.
"Why Should I Trust You?": Explaining the Predictions of Any Classifier
TLDR
LIME is proposed, a novel explanation technique that explains the predictions of any classifier in an interpretable and faithful manner, by learning aninterpretable model locally varound the prediction.
Annotation Artifacts in Natural Language Inference Data
TLDR
It is shown that a simple text categorization model can correctly classify the hypothesis alone in about 67% of SNLI and 53% of MultiNLI, and that specific linguistic phenomena such as negation and vagueness are highly correlated with certain inference classes.
...
1
2
3
4
...