Graph-Structured Representations for Visual Question Answering


This paper proposes to improve visual question answering (VQA) with structured representations of both scene contents and questions. A key challenge in VQA is to require joint reasoning over the visual and text domains. The predominant CNN/LSTM-based approach to VQA is limited by monolithic vector representations that largely ignore structure in the scene… (More)
DOI: 10.1109/CVPR.2017.344


