Structured Attentions for Visual Question Answering

  title={Structured Attentions for Visual Question Answering},
  author={Yiyang Zhuang and Yanpeng Zhao and Shuaiyi Huang and Kewei Tu and Yi Ma},
  journal={2017 IEEE International Conference on Computer Vision (ICCV)},
Visual attention, which assigns weights to image regions according to their relevance to a question, is considered as an indispensable part by most Visual Question Answering models. Although the questions may involve complex rela- tions among multiple regions, few attention models can ef- fectively encode such cross-region relations. In this paper, we demonstrate the importance of encoding such relations by showing the limited effective receptive field of ResNet on two datasets, and propose to… CONTINUE READING