• Publications
  • Influence
VQA: Visual Question Answering
We propose the task of free-form and open-ended Visual Question Answering (VQA). Given an image and a natural language question about the image, the task is to provide an accurate natural languageExpand
Don't Just Assume; Look and Answer: Overcoming Priors for Visual Question Answering
TLDR
GVQA explicitly disentangles the recognition of visual concepts present in the image from the identification of plausible answer space for a given question, enabling the model to more robustly generalize across different distributions of answers. Expand
VQA: Visual Question Answering
We propose the task of free-form and open-ended Visual Question Answering (VQA). Given an image and a natural language question about the image, the task is to provide an accurate natural languageExpand
Visual Storytelling
TLDR
Modelling concrete description as well as figurative and social language, as provided in this dataset and the storytelling task, has the potential to move artificial intelligence from basic understandings of typical visual scenes towards more and more human-like understanding of grounded event structure and subjective expression. Expand
Overcoming Language Priors in Visual Question Answering with Adversarial Regularization
TLDR
This work introduces a question-only model that takes as input the question encoding from the VQA model and must leverage language biases in order to succeed, and poses training as an adversarial game between this model and this question- only adversary -- discouraging the V QA model from capturing language bias in its question encoding. Expand
Analyzing the Behavior of Visual Question Answering Models
TLDR
Today's VQA models are "myopic" (tend to fail on sufficiently novel instances), often "jump to conclusions" (converge on a predicted answer after 'listening' to just half the question), and are "stubborn" (do not change their answers across images). Expand
C-VQA: A Compositional Split of the Visual Question Answering (VQA) v1.0 Dataset
TLDR
This paper proposes a new setting for Visual Question Answering where the test question-answer pairs are compositionally novel compared to training question- answer pairs, and presents a new compositional split of the VQA v1.0 dataset, which it is called Compositional VZA (C-VQA). Expand
Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering
TLDR
This work balances the popular VQA dataset by collecting complementary images such that every question in the authors' balanced dataset is associated with not just a single image, but rather a pair of similar images that result in two different answers to the question. Expand
Resolving Language and Vision Ambiguities Together: Joint Segmentation & Prepositional Attachment Resolution in Captioned Scenes
TLDR
This work presents an approach to simultaneously perform semantic segmentation and prepositional phrase attachment resolution for captioned images and shows that joint reasoning produces more accurate results than any module operating in isolation. Expand
Measuring Machine Intelligence Through Visual Question Answering
TLDR
A case study exploring the recently popular task of image captioning and its limitations as a task for measuring machine intelligence and an alternative and more promising task is Visual Question Answering that tests a machine's ability to reason about language and vision. Expand
...
1
2
...