• Corpus ID: 7006203

Interpreting Visual Question Answering Models

  title={Interpreting Visual Question Answering Models},
  author={Yash Goyal and Akrit Mohapatra and Devi Parikh and Dhruv Batra},
Deep neural networks have shown striking progress and obtained state-of-the-art results in many AI research fields in the recent years. However, it is often unsatisfying to not know why they predict what they do. In this paper, we address the problem of interpreting Visual Question Answering (VQA) models. Specifically, we are interested in finding what part of the input (pixels in images or words in questions) the VQA model focuses on while answering the question. To tackle this problem, we use… 

Figures and Tables from this paper

Taking a HINT: Leveraging Explanations to Make Vision and Language Models More Grounded
This work proposes a generic approach called Human Importance-aware Network Tuning (HINT), which effectively leverages human demonstrations to improve visual grounding and encourages deep networks to be sensitive to the same input regions as humans.
What's in a Question: Using Visual Questions as a Form of Supervision
This work proposes two simple but surprisingly effective modifications to the standard visual question answering models that allow them to make use of weak supervision in the form of unanswered questions associated with images and demonstrates that a simple data augmentation strategy inspired by the insights results in a 7.1% improvement on the standard VQA benchmark.
Choose Your Neuron: Incorporating Domain Knowledge through Neuron-Importance
This work learns to map domain knowledge about novel “unseen” classes onto this dictionary of learned concepts and optimizes for network parameters that can effectively combine these concepts – essentially learning classifiers by discovering and composing learned semantic concepts in deep networks.
Nothing Else Matters: Model-Agnostic Explanations By Identifying Prediction Invariance
This work proposes anchor-LIME (aLIME), a model-agnostic technique that produces high-precision rule-based explanations for which the coverage boundaries are very clear and is compared to linear LIME with simulated experiments, and demonstrates the flexibility of aLIME with qualitative examples from a variety of domains and tasks.
Explanation in Human-AI Systems: A Literature Meta-Review, Synopsis of Key Ideas and Publications, and Bibliography for Explainable AI
The Report expresses the explainability issues and challenges in modern AI, and presents capsule views of the leading psychological theories of explanation, and encourages AI/XAI researchers to include in their research reports fuller details on their empirical or experimental methods.
Vector Field Neural Networks
A new architecture, Vector Fields Neural Networks (VFNN), is proposed based on a new interpretation of Neural Networks, with the vector field becoming explicit, using the idea of implicit vector fields moving data as particles in a flow.
Developing a generalized intelligent agent by processing information on webpages
A framework for reinforcement learning (RL) agents to interact with a web environment that provides an agent with rich features including element positioning, color, and size in order to process text represented in a 2D web space.
Examining the effect of explanation on satisfaction and trust in AI diagnostic systems
This paper examines the effectiveness of explanations offered for AI systems in the healthcare domain across two simulation experiments and provides some design recommendations for the explanations offered.
Vector Field Based Neural Networks
A novel Neural Network architecture is proposed using the mathematically and physically rich idea of vector fields as hidden layers to perform nonlinear transformations in the data. The data points


Ask Your Neurons: A Neural-Based Approach to Answering Questions about Images
We address a question answering task on real-world images that is set up as a Visual Turing Test. By combining latest advances in image representation and natural language processing, we propose
Exploring Models and Data for Image Question Answering
This work proposes to use neural networks and visual semantic embeddings, without intermediate stages such as object detection and image segmentation, to predict answers to simple questions about images, and presents a question generation algorithm that converts image descriptions into QA form.
Visualizing and Understanding Convolutional Networks
A novel visualization technique is introduced that gives insight into the function of intermediate feature layers and the operation of the classifier in large Convolutional Network models, used in a diagnostic role to find model architectures that outperform Krizhevsky et al on the ImageNet classification benchmark.
Large-scale Simple Question Answering with Memory Networks
This paper studies the impact of multitask and transfer learning for simple question answering; a setting for which the reasoning required to answer is quite easy, as long as one can retrieve the correct evidence given a question, which can be difficult in large-scale conditions.
VQA: Visual Question Answering
We propose the task of free-form and open-ended Visual Question Answering (VQA). Given an image and a natural language question about the image, the task is to provide an accurate natural language
Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks
This work argues for the usefulness of a set of proxy tasks that evaluate reading comprehension via question answering, and classify these tasks into skill sets so that researchers can identify (and then rectify) the failings of their systems.
Going deeper with convolutions
We propose a deep convolutional neural network architecture codenamed Inception that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition
Sequence to Sequence Learning with Neural Networks
This paper presents a general end-to-end approach to sequence learning that makes minimal assumptions on the sequence structure, and finds that reversing the order of the words in all source sentences improved the LSTM's performance markedly, because doing so introduced many short term dependencies between the source and the target sentence which made the optimization problem easier.
What has my classifier learned? Visualizing the classification rules of bag-of-feature model by support region detection
  • Lingqiao Liu, Lei Wang
  • Computer Science
    2012 IEEE Conference on Computer Vision and Pattern Recognition
  • 2012
This work developed an efficient RSRS detection algorithm that showed that it can be used to identify the limitation of a classifier, predict its failure mode, discover the classification rules and reveal the database bias.
How to Explain Individual Classification Decisions
This paper proposes a procedure which (based on a set of assumptions) allows to explain the decisions of any classification method.