Are You Smarter Than a Sixth Grader? Textbook Question Answering for Multimodal Machine Comprehension

@article{Kembhavi2017AreYS,
  title={Are You Smarter Than a Sixth Grader? Textbook Question Answering for Multimodal Machine Comprehension},
  author={Aniruddha Kembhavi and Min Joon Seo and Dustin Schwenk and Jonghyun Choi and Ali Farhadi and Hannaneh Hajishirzi},
  journal={2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2017},
  pages={5376-5384}
}
We introduce the task of Multi-Modal Machine Comprehension (M3C), which aims at answering multimodal questions given a context of text, diagrams and images. We present the Textbook Question Answering (TQA) dataset that includes 1,076 lessons and 26,260 multi-modal questions, taken from middle school science curricula. Our analysis shows that a significant portion of questions require complex parsing of the text and the diagrams and reasoning, indicating that our dataset is more complex compared… CONTINUE READING

Citations

Publications citing this paper.
SHOWING 1-10 OF 34 CITATIONS

Context Part Text ◦-◦ Image-◦ ◦ Question Part Text ◦ ◦ ◦ Image-- ◦

Daesik Kim, Seonhoon Kim, Nojun Kwak
  • 2019
VIEW 4 EXCERPTS
HIGHLY INFLUENCED

Essay-Anchor Attentive Multi-Modal Bilinear Pooling for Textbook Question Answering

  • 2018 IEEE International Conference on Multimedia and Expo (ICME)
  • 2018
VIEW 5 EXCERPTS
CITES BACKGROUND & METHODS
HIGHLY INFLUENCED

Textbook Question Answering Under Instructor Guidance with Memory Networks

  • 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
  • 2018
VIEW 8 EXCERPTS
CITES METHODS & BACKGROUND
HIGHLY INFLUENCED

References

Publications referenced by this paper.
SHOWING 1-10 OF 33 REFERENCES

Long Short-Term Memory

  • Neural Computation
  • 1997
VIEW 7 EXCERPTS
HIGHLY INFLUENTIAL

and P

P. Rajpurkar, J. Zhang, K. Lopyrev
  • Liang. Squad: 100,000+ questions for machine comprehension of text. In EMNLP
  • 2016
VIEW 8 EXCERPTS
HIGHLY INFLUENTIAL

VQA: Visual Question Answering

VIEW 6 EXCERPTS
HIGHLY INFLUENTIAL

Memory Networks

VIEW 3 EXCERPTS
HIGHLY INFLUENTIAL

Ask

H. Xu, K. Saenko
  • attend and answer: Exploring question-guided spatial attention for visual question answering. In ECCV
  • 2016
VIEW 2 EXCERPTS