Corpus ID: 102352295

Lucid Explanations Help: Using a Human-AI Image-Guessing Game to Evaluate Machine Explanation Helpfulness

  title={Lucid Explanations Help: Using a Human-AI Image-Guessing Game to Evaluate Machine Explanation Helpfulness},
  author={Arijit Ray and Giedrius Burachas and Yi Yao and Ajay Divakaran},
While there have been many proposals on how to make AI algorithms more transparent, few have attempted to evaluate the impact of AI explanations on human performance on a task using AI. [...] Key Result We observe that "helpful" explanations are conducive to game performance (by almost 22% for "excellent" rated explanation games), and having at least one "correct" explanation is significantly helpful when VQA system answers are mostly noisy (by almost 30% compared to no explanation games).Expand
3 Citations
A Study on Multimodal and Interactive Explanations for Visual Question Answering
The results indicate that the explanations help improve human prediction accuracy, especially in trials when the VQA system's answer is inaccurate, and suggest the efficacy of these explanations in human-machine AI collaboration tasks. Expand
The Impact of Explanations on AI Competency Prediction in VQA
This paper introduces an explainable VQA system that uses spatial and object features and is powered by the BERT language model, and evaluates the impact of explanations on the user's mental model of AI agent competency within the task of visual question answering (VQA). Expand
Improving Users' Mental Model with Attention-directed Counterfactual Edits
This work shows that showing controlled counterfactual image-question examples are more effective at improving the mental model of users as compared to simply showing random examples, and compares a generative approach and a retrieval-based approach to showcounterfactual examples. Expand


Multimodal Explanations: Justifying Decisions and Pointing to the Evidence
It is quantitatively shown that training with the textual explanations not only yields better textual justification models, but also better localizes the evidence that supports the decision, supporting the thesis that multimodal explanation models offer significant benefits over unimodal approaches. Expand
Evaluating Visual Conversational Agents via Cooperative Human-AI Games
A cooperative game - GuessWhich - is designed to measure human-AI team performance in the specific context of the AI being a visual conversational agent, and a counterintuitive trend is suggested - that while AI literature shows that one version outperforms the other when paired with an AI questioner bot, it is found that this improvement in AI-AI performance does not translate to improved human- AI performance. Expand
VQA: Visual Question Answering
We propose the task of free-form and open-ended Visual Question Answering (VQA). Given an image and a natural language question about the image, the task is to provide an accurate natural languageExpand
Do explanations make VQA models more predictable to a human?
This work analyzes if existing explanations indeed make a VQA model — its responses as well as failures — more predictable to a human, and finds that they do not, and that human-in-the-loop approaches that treat the model as a black-box do. Expand
GuessWhat?! Visual Object Discovery through Multi-modal Dialogue
We introduce GuessWhat?!, a two-player guessing game as a testbed for research on the interplay of computer vision and dialogue systems. The goal of the game is to locate an unknown object in a richExpand
Generating Visual Explanations
A new model is proposed that focuses on the discriminating properties of the visible object, jointly predicts a class label, and explains why the predicted label is appropriate for the image, and generates sentences that realize a global sentence property, such as class specificity. Expand
Show, Ask, Attend, and Answer: A Strong Baseline For Visual Question Answering
This model, while being architecturally simple and relatively small in terms of trainable parameters, sets a new state of the art on both unbalanced and balanced VQA benchmark. Expand
Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning
This work poses a cooperative ‘image guessing’ game between two agents who communicate in natural language dialog so that Q-BOT can select an unseen image from a lineup of images and shows the emergence of grounded language and communication among ‘visual’ dialog agents with no human supervision. Expand
Human Attention in Visual Question Answering: Do Humans and Deep Networks look at the same regions?
The VQA-HAT (Human ATtention) dataset is introduced and attention maps generated by state-of-the-art V QA models are evaluated against human attention both qualitatively and quantitatively. Expand
Ask, Attend and Answer: Exploring Question-Guided Spatial Attention for Visual Question Answering
The Spatial Memory Network, a novel spatial attention architecture that aligns words with image patches in the first hop, is proposed and improved results are obtained compared to a strong deep baseline model which concatenates image and question features to predict the answer. Expand