Visual Question Answering on Image Sets

  title={Visual Question Answering on Image Sets},
  author={Ankan Bansal and Y. Zhang and R. Chellappa},
  • Ankan Bansal, Y. Zhang, R. Chellappa
  • Published 2020
  • Computer Science
  • ArXiv
  • We introduce the task of Image-Set Visual Question Answering (ISVQA), which generalizes the commonly studied single-image VQA problem to multi-image settings. Taking a natural language question and a set of images as input, it aims to answer the question based on the content of the images. The questions can be about objects and relationships in one or more images or about the entire scene depicted by the image set. To enable research in this new topic, we introduce two ISVQA datasets - indoor… CONTINUE READING

    Figures and Tables from this paper.


    OK-VQA: A Visual Question Answering Benchmark Requiring External Knowledge
    • 33
    • PDF
    Hierarchical Question-Image Co-Attention for Visual Question Answering
    • 774
    • PDF
    Focal Visual-Text Attention for Visual Question Answering
    • 60
    • PDF
    Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering
    • 628
    • PDF
    VQA: Visual Question Answering
    • 1,901
    • Highly Influential
    • PDF
    Visual7W: Grounded Question Answering in Images
    • 430
    • PDF
    VizWiz Grand Challenge: Answering Visual Questions from Blind People
    • 101
    • PDF
    Towards VQA Models That Can Read
    • 65
    • PDF
    Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations
    • 1,519
    • PDF
    TVQA+: Spatio-Temporal Grounding for Video Question Answering
    • 31
    • PDF