Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering
- Yash Goyal, Tejas Khot, Douglas Summers-Stay, Dhruv Batra, Devi Parikh
- Computer ScienceComputer Vision and Pattern Recognition
- 2 December 2016
This work balances the popular VQA dataset by collecting complementary images such that every question in the authors' balanced dataset is associated with not just a single image, but rather a pair of similar images that result in two different answers to the question.
Yin and Yang: Balancing and Answering Binary Visual Questions
- Peng Zhang, Yash Goyal, Douglas Summers-Stay, Dhruv Batra, Devi Parikh
- Computer ScienceComputer Vision and Pattern Recognition
- 16 November 2015
This paper addresses binary Visual Question Answering on abstract scenes as visual verification of concepts inquired in the questions by converting the question to a tuple that concisely summarizes the visual concept to be detected in the image.
Counterfactual Visual Explanations
- Yash Goyal, Ziyan Wu, Jan Ernst, Dhruv Batra, Devi Parikh, Stefan Lee
- Computer ScienceInternational Conference on Machine Learning
- 16 April 2019
It is found that users trained to distinguish bird species fare better when given access to counterfactual explanations in addition to training examples, and the effectiveness of these explanations in teaching humans is explored.
Explaining Classifiers with Causal Concept Effect (CaCE)
- Yash Goyal, Amir Feder, Uri Shalit, Been Kim
- Computer ScienceArXiv
- 16 July 2019
This work defines the Causal Concept Effect (CaCE) as the causal effect of a human-interpretable concept on a deep neural net's predictions, and shows that the CaCE measure can avoid errors stemming from confounding.
Towards Transparent AI Systems: Interpreting Visual Question Answering Models
- Yash Goyal, Akrit Mohapatra, Devi Parikh, Dhruv Batra
- Computer Science
- 31 August 2016
The problem of interpreting Visual Question Answering (VQA) models is addressed, and it is found that even without explicit attention mechanisms, VQA models may sometimes be implicitly attending to relevant regions in the image, and often to appropriate words in the question.
Question-Conditioned Counterfactual Image Generation for VQA
- Jingjing Pan, Yash Goyal, Stefan Lee
- Computer ScienceArXiv
- 14 November 2019
This ongoing work proposes learning to generate counterfactual images for a VQA model - i.e. given a question-image pair, the model is asked to generate a new image such that i) the V QA model outputs a different answer, ii) the new image is minimally different from the original, and iii) thenew image is realistic.
CloudCV: Large-Scale Distributed Computer Vision as a Cloud Service
- Harsh Agrawal, Clint Solomon Mathialagan, Dhruv Batra
- Computer ScienceMobile Cloud Visual Media Computing
- 12 June 2015
The goal is to democratize computer vision; one should not have to be a computer vision, big data and distributed computing expert to have access to state-of-the-art distributed computer vision algorithms.
Image Retrieval from Contextual Descriptions
- Benno Krojer, Vaibhav Adlakha, Vibhav Vineet, Yash Goyal, E. Ponti, Siva Reddy
- Computer ScienceAnnual Meeting of the Association for…
- 29 March 2022
A new multimodal challenge, Image Retrieval from Contextual Descriptions (ImageCoDe), where models are tasked with retrieving the correct image from a set of 10 minimally contrastive candidates based on a contextual description, revealing that these models dramatically lag behind human performance.
Resolving Language and Vision Ambiguities Together: Joint Segmentation & Prepositional Attachment Resolution in Captioned Scenes
- Gordon A. Christie, A. Laddha, Dhruv Batra
- Computer ScienceConference on Empirical Methods in Natural…
- 7 April 2016
This work presents an approach to simultaneously perform semantic segmentation and prepositional phrase attachment resolution for captioned images and shows that joint reasoning produces more accurate results than any module operating in isolation.
Interpreting Visual Question Answering Models
- Yash Goyal, Akrit Mohapatra, Devi Parikh, Dhruv Batra
- Computer ScienceArXiv
- 31 August 2016
This paper uses two visualization techniques -- guided backpropagation and occlusion -- to find important words in the question and important regions in the image and presents qualitative and quantitative analyses of these importance maps.
...
...