From Recognition to Cognition: Visual Commonsense Reasoning

@article{Zellers2018FromRT,
  title={From Recognition to Cognition: Visual Commonsense Reasoning},
  author={Rowan Zellers and Yonatan Bisk and Ali Farhadi and Yejin Choi},
  journal={ArXiv},
  year={2018},
  volume={abs/1811.10830}
}
Visual understanding goes well beyond object recognition. With one glance at an image, we can effortlessly imagine the world beyond the pixels: for instance, we can infer people's actions, goals, and mental states. While this task is easy for humans, it is tremendously difficult for today's vision systems, requiring higher-order cognition and commonsense reasoning about the world. We formalize this task as Visual Commonsense Reasoning. Given a challenging question about an image, a machine must… CONTINUE READING

Figures, Tables, Results, and Topics from this paper.

Key Quantitative Results

  • Experiments on VCR show that R2C greatly outperforms state-of-the-art visual question-answering systems: obtaining 65% accuracy at question answering, 67% at answer justification, and 44% at staged answering and justification.

Similar Papers

Citations

Publications citing this paper.
SHOWING 1-10 OF 11 CITATIONS

Vision-and-Dialog Navigation

VIEW 4 EXCERPTS
CITES BACKGROUND & METHODS
HIGHLY INFLUENCED

Defending Against Neural Fake News

  • ArXiv
  • 2019
VIEW 3 EXCERPTS
CITES BACKGROUND

References

Publications referenced by this paper.
SHOWING 1-10 OF 98 REFERENCES

Hadamard Product for Low-rank Bilinear Pooling

Jin-Hwa Kim, Kyoung Woon On, +3 authors Byoung-Tak Zhang
  • In The 5th International Conference on Learning Representations,
  • 2017
VIEW 5 EXCERPTS
HIGHLY INFLUENTIAL

Long Short-Term Memory

  • Neural Computation
  • 1997
VIEW 6 EXCERPTS
HIGHLY INFLUENTIAL

Deep contextualized word representations

VIEW 6 EXCERPTS
HIGHLY INFLUENTIAL

MUTAN: Multimodal Tucker Fusion for Visual Question Answering

  • 2017 IEEE International Conference on Computer Vision (ICCV)
  • 2017
VIEW 4 EXCERPTS
HIGHLY INFLUENTIAL

Enhanced LSTM for Natural Language Inference

VIEW 6 EXCERPTS
HIGHLY INFLUENTIAL