Multi-level Attention Networks for Visual Question Answering

  title={Multi-level Attention Networks for Visual Question Answering},
  author={Dongfei Yu and Jianlong Fu and Tao Mei and Yong Rui},
  journal={2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
Inspired by the recent success of text-based question answering, visual question answering (VQA) is proposed to automatically answer natural language questions with the reference to a given image. Compared with text-based QA, VQA is more challenging because the reasoning process on visual domain needs both effective semantic embedding and fine-grained visual understanding. Existing approaches predominantly infer answers from the abstract low-level visual features, while neglecting the modeling… CONTINUE READING
Highly Cited
This paper has 55 citations. REVIEW CITATIONS


Publications citing this paper.
Showing 1-10 of 32 extracted citations

56 Citations

Citations per Year
Semantic Scholar estimates that this publication has 56 citations based on the available data.

See our FAQ for additional information.


Publications referenced by this paper.
Showing 1-10 of 41 references

Similar Papers

Loading similar papers…