Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering
@article{Goyal2018MakingTV, title={Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering}, author={Yash Goyal and Tejas Khot and Aishwarya Agrawal and Douglas Summers-Stay and Dhruv Batra and Devi Parikh}, journal={International Journal of Computer Vision}, year={2018}, volume={127}, pages={398 - 414} }
The problem of visual question answering (VQA) is of significant importance both as a challenging research question and for the rich set of applications it enables. [] Key Result This can help in building trust for machines among their users.
1,379 Citations
Reducing Language Biases in Visual Question Answering with Visually-Grounded Question Encoder
- Computer ScienceECCV
- 2020
This work proposes a novel model-agnostic question encoder, Visually-Grounded Question Encoder (VGQE), for VQA that reduces the dependency of the model on the language priors, and achieves state-of-the-art results on the bias-sensitive split of the VQAv2 dataset.
RUBi: Reducing Unimodal Biases in Visual Question Answering
- Computer ScienceNeurIPS
- 2019
RUBi, a new learning strategy to reduce biases in any VQA model, is proposed, which reduces the importance of the most biased examples, i.e. examples that can be correctly classified without looking at the image.
Visual Question Generation as Dual Task of Visual Question Answering
- Computer Science2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
- 2018
This paper proposes an end-to-end unified model, the Invertible Question Answering Network (iQAN), to introduce question generation as a dual task of question answering to improve the VQA performance and shows that the proposed dual training framework can consistently improve model performances of many popular V QA architectures.
OK-VQA: A Visual Question Answering Benchmark Requiring External Knowledge
- Computer Science2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- 2019
This paper addresses the task of knowledge-based visual question answering and provides a benchmark, called OK-VQA, where the image content is not sufficient to answer the questions, encouraging methods that rely on external knowledge resources.
Overcoming language priors in VQA via adding visual module
- Computer ScienceNeural Computing and Applications
- 2022
This work proposes a method that will improve visual content further to enhance the impact of visual content on answers in VQA and proves the effectiveness of the method and further improves the accuracy of the different models.
Inverse Visual Question Answering: A New Benchmark and VQA Diagnosis Tool
- Computer ScienceIEEE Transactions on Pattern Analysis and Machine Intelligence
- 2020
This paper proposes a variational iVQA model that can generate diverse, grammatically correct and content correlated questions that match the given answer, and shows that iVZA is an interesting benchmark for visuo-linguistic understanding, and a more challenging alternative to VQA because an iV QA model needs to understand the image better to be successful.
VQG for VQA
- Computer Science
- 2017
This work proposes an end-to-end unified framework, the Invertible Question Answering Network (iQAN), to leverage the complementary relations between questions and answers in images by jointly training the model on VQA and VQG tasks.
Estimating semantic structure for the VQA answer space
- Computer ScienceArXiv
- 2020
This work proposes two measures of proximity between VQA classes, and proposes a corresponding loss which takes into account the estimated proximity, and shows that this approach is completely model-agnostic since it allows consistent improvements with three different V QA models.
An experimental study of the vision-bottleneck in VQA
- Computer ScienceSSRN Electronic Journal
- 2022
This work proposes an in-depth study of the vision-bottleneck in VQA, experimenting with both the quantity and quality of visual objects extracted from images, and study the impact of two methods to incorporate the information about objects necessary for answering a question in the reasoning module directly and earlier in the object selection stage.
VC-VQA: Visual Calibration Mechanism For Visual Question Answering
- Computer Science2020 IEEE International Conference on Image Processing (ICIP)
- 2020
The proposed model reconstructs image features based on predicted answer with question and measures the similarity between reconstructed image feature and original image feature, which will guide the VQA model predict the final answer.