Answering Visual What-If Questions: From Actions to Predicted Scene Descriptions

@article{Wagner2018AnsweringVW,
  title={Answering Visual What-If Questions: From Actions to Predicted Scene Descriptions},
  author={M. Wagner and H. Basevi and Rakshith Shetty and Wenbin Li and Mateusz Malinowski and Mario Fritz and A. Leonardis},
  journal={ArXiv},
  year={2018},
  volume={abs/1809.03707}
}
In-depth scene descriptions and question answering tasks have greatly increased the scope of today’s definition of scene understanding. [...] Key Method Our solution is a hybrid model which integrates a physics engine into a question answering architecture in order to anticipate future scene states resulting from object-object interactions caused by an action. We demonstrate first results on this challenging new problem and compare to baselines, where we outperform fully data-driven end-to-end learning…Expand
CLEVR_HYP: A Challenge Dataset and Baselines for Visual Question Answering with Hypothetical Actions over Images
Visual Question Answering and Beyond
From Recognition to Cognition: Visual Commonsense Reasoning
Commonsense Reasoning and Knowledge Acquisition to Guide Deep Learning on Robots
Understanding in Artificial Intelligence
...
1
2
...

References

SHOWING 1-10 OF 43 REFERENCES
Visual7W: Grounded Question Answering in Images
Ask Your Neurons: A Deep Learning Approach to Visual Question Answering
A Multi-World Approach to Question Answering about Real-World Scenes based on Uncertain Input
VQA: Visual Question Answering
Exploring Models and Data for Image Question Answering
Visual Madlibs: Fill in the Blank Description Generation and Question Answering
Towards a Visual Turing Challenge
Stacked Attention Networks for Image Question Answering
Don't Just Assume; Look and Answer: Overcoming Priors for Visual Question Answering
Learning to Reason: End-to-End Module Networks for Visual Question Answering
...
1
2
3
4
5
...