Graph-Structured Representations for Visual Question Answering


This paper proposes to improve visual question answering (VQA) with structured representations of both scene contents and questions. A key challenge in VQA is to require joint reasoning over the visual and text domains. The predominant CNN/LSTM-based approach to VQA is limited by monolithic vector representations that largely ignore structure in the scene… (More)
DOI: 10.1109/CVPR.2017.344


7 Figures and Tables


Citations per Year

Citation Velocity: 47

Averaging 47 citations per year over the last 2 years.

Learn more about how we calculate this metric in our FAQ.