Are You Smarter Than a Sixth Grader? Textbook Question Answering for Multimodal Machine Comprehension
@article{Kembhavi2017AreYS, title={Are You Smarter Than a Sixth Grader? Textbook Question Answering for Multimodal Machine Comprehension}, author={Aniruddha Kembhavi and Minjoon Seo and D. Schwenk and Jonghyun Choi and Ali Farhadi and Hannaneh Hajishirzi}, journal={2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)}, year={2017}, pages={5376-5384} }
We introduce the task of Multi-Modal Machine Comprehension (M3C), which aims at answering multimodal questions given a context of text, diagrams and images. [...] Key Method We extend state-of-the-art methods for textual machine comprehension and visual question answering to the TQA dataset. Our experiments show that these models do not perform well on TQA. The presented dataset opens new challenges for research in question answering and reasoning across multiple modalities.Expand Abstract
Figures, Tables, and Topics from this paper
92 Citations
Looking Beyond the Surface: A Challenge Set for Reading Comprehension over Multiple Sentences
- Computer Science
- NAACL-HLT
- 2018
- 127
- PDF
Textbook Question Answering Under Instructor Guidance with Memory Networks
- Computer Science
- 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
- 2018
- 7
- Highly Influenced
- PDF
ISAAQ - Mastering Textbook Questions with Pre-trained Transformers and Bottom-Up and Top-Down Attention
- Computer Science
- EMNLP
- 2020
- Highly Influenced
- PDF
Textbook Question Answering with Multi-modal Context Graph Understanding and Self-supervised Open-set Comprehension
- Computer Science
- ACL
- 2019
- 3
- PDF
Answering Questions about Data Visualizations using Efficient Bimodal Fusion
- Computer Science
- 2020 IEEE Winter Conference on Applications of Computer Vision (WACV)
- 2020
- 12
- PDF
RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes
- Computer Science
- EMNLP
- 2018
- 50
- PDF
Diverse Visuo-Lingustic Question Answering (DVLQA) Challenge
- Computer Science
- ArXiv
- 2020
- Highly Influenced
References
SHOWING 1-10 OF 33 REFERENCES
MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text
- Computer Science
- EMNLP
- 2013
- 479
- PDF
Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks
- Computer Science, Mathematics
- ICLR
- 2016
- 779
- PDF
Are You Talking to a Machine? Dataset and Methods for Multilingual Image Question
- Computer Science
- NIPS
- 2015
- 350
- PDF
VQA: Visual Question Answering
- Computer Science
- 2015 IEEE International Conference on Computer Vision (ICCV)
- 2015
- 2,008
- Highly Influential
- PDF
Dynamic Memory Networks for Visual and Textual Question Answering
- Computer Science
- ICML
- 2016
- 559
- Highly Influential
- PDF
MovieQA: Understanding Stories in Movies through Question-Answering
- Computer Science
- 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- 2016
- 339
- PDF
A Thorough Examination of the CNN/Daily Mail Reading Comprehension Task
- Computer Science
- ACL
- 2016
- 428
- PDF
Ask Your Neurons: A Neural-Based Approach to Answering Questions about Images
- Computer Science
- 2015 IEEE International Conference on Computer Vision (ICCV)
- 2015
- 462
- PDF