OK-VQA: A Visual Question Answering Benchmark Requiring External Knowledge

@article{Marino2019OKVQAAV,
  title={OK-VQA: A Visual Question Answering Benchmark Requiring External Knowledge},
  author={Kenneth Marino and Mohammad Rastegari and Ali Farhadi and Roozbeh Mottaghi},
  journal={ArXiv},
  year={2019},
  volume={abs/1906.00067}
}
  • Kenneth Marino, Mohammad Rastegari, +1 author Roozbeh Mottaghi
  • Published in ArXiv 2019
Visual Question Answering (VQA) in its ideal form lets us study reasoning in the joint space of vision and language and serves as a proxy for the AI task of scene understanding. However, most VQA benchmarks to date are focused on questions such as simple counting, visual attributes, and object detection that do not require reasoning or knowledge beyond what is in the image. In this paper, we address the task of knowledge-based visual question answering and provide a benchmark, called OK-VQA… CONTINUE READING

Topics from this paper.

References

Publications referenced by this paper.
SHOWING 1-10 OF 52 REFERENCES

Bilinear Attention Networks

VIEW 13 EXCERPTS
HIGHLY INFLUENTIAL

FVQA: Fact-Based Visual Question Answering

Peng Wang, Qi Wu, Chunhua Shen, Anthony R. Dick, Anton van den Hengel
  • IEEE Transactions on Pattern Analysis and Machine Intelligence
  • 2018
VIEW 8 EXCERPTS
HIGHLY INFLUENTIAL

CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning

  • 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2017
VIEW 7 EXCERPTS
HIGHLY INFLUENTIAL

Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering

  • International Journal of Computer Vision
  • 2017
VIEW 5 EXCERPTS
HIGHLY INFLUENTIAL

Visual Madlibs: Fill in the Blank Description Generation and Question Answering

  • 2015 IEEE International Conference on Computer Vision (ICCV)
  • 2015
VIEW 4 EXCERPTS
HIGHLY INFLUENTIAL

MUTAN: Multimodal Tucker Fusion for Visual Question Answering

  • 2017 IEEE International Conference on Computer Vision (ICCV)
  • 2017
VIEW 4 EXCERPTS
HIGHLY INFLUENTIAL

MovieQA: Understanding Stories in Movies through Question-Answering

  • 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2016
VIEW 5 EXCERPTS
HIGHLY INFLUENTIAL

Visual7W: Grounded Question Answering in Images

  • 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2016
VIEW 5 EXCERPTS
HIGHLY INFLUENTIAL

VQA: Visual Question Answering

VIEW 6 EXCERPTS
HIGHLY INFLUENTIAL

Similar Papers

Loading similar papers…