Learning to Reason: End-to-End Module Networks for Visual Question Answering

  title={Learning to Reason: End-to-End Module Networks for Visual Question Answering},
  author={Ronghang Hu and Jacob Andreas and Marcus Rohrbach and Trevor Darrell and Kate Saenko},
  journal={2017 IEEE International Conference on Computer Vision (ICCV)},
Natural language questions are inherently compositional, and many are most easily answered by reasoning about their decomposition into modular sub-problems. For example, to answer “is there an equal number of balls and boxes?” we can look for balls, look for boxes, count them, and compare the results. The recently proposed Neural Module Network (NMN) architecture [3, 2] implements this approach to question answering by parsing questions into linguistic substructures and assembling question… CONTINUE READING
Highly Influential
This paper has highly influenced a number of papers. REVIEW HIGHLY INFLUENTIAL CITATIONS
Highly Cited
This paper has 97 citations. REVIEW CITATIONS

9 Figures & Tables



Citations per Year

97 Citations

Semantic Scholar estimates that this publication has 97 citations based on the available data.

See our FAQ for additional information.