Ask, Attend and Answer: Exploring Question-Guided Spatial Attention for Visual Question Answering

@inproceedings{Xu2016AskAA,
  title={Ask, Attend and Answer: Exploring Question-Guided Spatial Attention for Visual Question Answering},
  author={Huijuan Xu and Kate Saenko},
  booktitle={ECCV},
  year={2016}
}
The problem of Visual Question Answering (VQA) requires joint image and language understanding to answer a question about a given photograph. Recent approaches have applied deep image captioning methods based on recurrent LSTM networks to this problem, but have failed to model spatial inference. In this paper, we propose a memory network with spatial attention for the VQA task. Memory networks are recurrent neural networks with an explicit attention mechanism that selects certain parts of the… CONTINUE READING
Highly Influential
This paper has highly influenced 17 other papers. REVIEW HIGHLY INFLUENTIAL CITATIONS
Highly Cited
This paper has 277 citations. REVIEW CITATIONS
Related Discussions
This paper has been referenced on Twitter 16 times. VIEW TWEETS

Citations

Publications citing this paper.
Showing 1-10 of 200 extracted citations

Focal Visual-Text Attention for Memex Question Answering.

IEEE transactions on pattern analysis and machine intelligence • 2019
View 6 Excerpts
Highly Influenced

A Better Way to Attend: Attention With Trees for Video Question Answering

IEEE Transactions on Image Processing • 2018
View 5 Excerpts
Highly Influenced

Bilinear Attention Networks

View 4 Excerpts
Highly Influenced

278 Citations

050100150'14'16'18
Citations per Year
Semantic Scholar estimates that this publication has 278 citations based on the available data.

See our FAQ for additional information.

References

Publications referenced by this paper.
Showing 1-10 of 29 references

Ask Your Neurons: A Neural-Based Approach to Answering Questions about Images

2015 IEEE International Conference on Computer Vision (ICCV) • 2015
View 20 Excerpts
Highly Influenced

VQA: Visual Question Answering

ICCV 2015 • 2015
View 4 Excerpts
Highly Influenced

Caffe: Convolutional Architecture for Fast Feature Embedding

ACM Multimedia • 2014
View 3 Excerpts
Highly Influenced

Memory Networks

View 5 Excerpts
Highly Influenced

Microsoft COCO: Common Objects in Context

View 3 Excerpts
Highly Influenced