OK-VQA: A Visual Question Answering Benchmark Requiring External Knowledge

@article{Marino2019OKVQAAV,
  title={OK-VQA: A Visual Question Answering Benchmark Requiring External Knowledge},
  author={Kenneth Marino and Mohammad Rastegari and Ali Farhadi and Roozbeh Mottaghi},
  journal={2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2019},
  pages={3190-3199}
}
Visual Question Answering (VQA) in its ideal form lets us study reasoning in the joint space of vision and language and serves as a proxy for the AI task of scene understanding. However, most VQA benchmarks to date are focused on questions such as simple counting, visual attributes, and object detection that do not require reasoning or knowledge beyond what is in the image. In this paper, we address the task of knowledge-based visual question answering and provide a benchmark, called OK-VQA… CONTINUE READING
2
Twitter Mentions

References

Publications referenced by this paper.
SHOWING 1-10 OF 50 REFERENCES

Bilinear Attention Networks

VIEW 7 EXCERPTS
HIGHLY INFLUENTIAL

FVQA: Fact-Based Visual Question Answering

VIEW 8 EXCERPTS
HIGHLY INFLUENTIAL

Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering

VIEW 5 EXCERPTS
HIGHLY INFLUENTIAL

MUTAN: Multimodal Tucker Fusion for Visual Question Answering

VIEW 4 EXCERPTS
HIGHLY INFLUENTIAL

CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning

VIEW 7 EXCERPTS
HIGHLY INFLUENTIAL

MovieQA: Understanding Stories in Movies through Question-Answering

VIEW 5 EXCERPTS
HIGHLY INFLUENTIAL

VQA: Visual Question Answering

VIEW 6 EXCERPTS
HIGHLY INFLUENTIAL

Visual Madlibs: Fill in the Blank Description Generation and Question Answering

VIEW 4 EXCERPTS
HIGHLY INFLUENTIAL

Visual7W: Grounded Question Answering in Images

VIEW 5 EXCERPTS
HIGHLY INFLUENTIAL