Ask, Attend and Answer: Exploring Question-Guided Spatial Attention for Visual Question Answering
@inproceedings{Xu2016AskAA, title={Ask, Attend and Answer: Exploring Question-Guided Spatial Attention for Visual Question Answering}, author={Huijuan Xu and Kate Saenko}, booktitle={ECCV}, year={2016} }
- Published 2016 in ECCV
DOI:10.1007/978-3-319-46478-7_28
The problem of Visual Question Answering (VQA) requires joint image and language understanding to answer a question about a given photograph. Recent approaches have applied deep image captioning methods based on recurrent LSTM networks to this problem, but have failed to model spatial inference. In this paper, we propose a memory network with spatial attention for the VQA task. Memory networks are recurrent neural networks with an explicit attention mechanism that selects certain parts of the… CONTINUE READING
From This Paper
Figures, tables, and topics from this paper.
Citations
Publications citing this paper.
Showing 1-10 of 200 extracted citations
Being Negative but Constructively: Lessons Learnt from Creating Better Visual Question Answering Datasets
View 4 Excerpts
Highly Influenced
Multimodal Attention in Recurrent Neural Networks for Visual Question Answering
View 4 Excerpts
Highly Influenced
Locally Smoothed Neural Networks
View 7 Excerpts
Highly Influenced
Hierarchical Co-Attention for Visual Question Answering
View 5 Excerpts
Method Support
Highly Influenced
Hierarchical Question-Image Co-Attention for Visual Question Answering
View 6 Excerpts
Method Support
Highly Influenced
Visual Question Answering using Natural Language Object Retrieval and Saliency Cues
View 5 Excerpts
Highly Influenced
Focal Visual-Text Attention for Memex Question Answering.
View 6 Excerpts
Highly Influenced
A Better Way to Attend: Attention With Trees for Video Question Answering
View 5 Excerpts
Highly Influenced
Bilinear Attention Networks
View 4 Excerpts
Highly Influenced
EARN TO P AY A TTENTION
View 4 Excerpts
Highly Influenced
Citation Statistics
278 Citations
Citations per Year
Semantic Scholar estimates that this publication has 278 citations based on the available data.
See our FAQ for additional information.
References
Publications referenced by this paper.
Showing 1-10 of 29 references
End-To-End Memory Networks
View 5 Excerpts
Highly Influenced
Ask Your Neurons: A Neural-Based Approach to Answering Questions about Images
View 20 Excerpts
Highly Influenced
Exploring Models and Data for Image Question Answering
View 10 Excerpts
Highly Influenced
VQA: Visual Question Answering
View 4 Excerpts
Highly Influenced
A Multi-World Approach to Question Answering about Real-World Scenes based on Uncertain Input
View 6 Excerpts
Highly Influenced
Caffe: Convolutional Architecture for Fast Feature Embedding
View 3 Excerpts
Highly Influenced
Memory Networks
View 5 Excerpts
Highly Influenced
Microsoft COCO: Common Objects in Context
View 3 Excerpts
Highly Influenced