From Recognition to Cognition: Visual Commonsense Reasoning

@article{Zellers2018FromRT,
  title={From Recognition to Cognition: Visual Commonsense Reasoning},
  author={Rowan Zellers and Yonatan Bisk and Ali Farhadi and Yejin Choi},
  journal={2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2018},
  pages={6713-6724}
}
Visual understanding goes well beyond object recognition. [...] Key Method Next, we introduce a new dataset, VCR, consisting of 290k multiple choice QA problems derived from 110k movie scenes. The key recipe for generating non-trivial and high-quality problems at scale is Adversarial Matching, a new approach to transform rich annotations into multiple choice questions with minimal bias. Experimental results show that while humans find VCR easy (over 90% accuracy), state-of-the-art vision models struggle (~45…Expand Abstract

Citations

Publications citing this paper.
SHOWING 1-10 OF 47 CITATIONS

Connective Cognition Network for Directional Visual Commonsense Reasoning

VIEW 13 EXCERPTS
CITES BACKGROUND & METHODS
HIGHLY INFLUENCED

TAB-VCR: Tags and Attributes based VCR Baselines

VIEW 17 EXCERPTS
CITES METHODS & BACKGROUND
HIGHLY INFLUENCED

VisualBERT: A Simple and Performant Baseline for Vision and Language

VIEW 14 EXCERPTS
CITES BACKGROUND & METHODS
HIGHLY INFLUENCED

A Simple Baseline for Visual Commonsense Reasoning

VIEW 10 EXCERPTS
CITES BACKGROUND, METHODS & RESULTS
HIGHLY INFLUENCED

Enforcing Reasoning in Visual Commonsense Reasoning

VIEW 10 EXCERPTS
CITES METHODS, RESULTS & BACKGROUND
HIGHLY INFLUENCED

Fusion of Detected Objects in Text for Visual Question Answering

VIEW 7 EXCERPTS
CITES METHODS & BACKGROUND
HIGHLY INFLUENCED

Heterogeneous Graph Learning for Visual Commonsense Reasoning

VIEW 8 EXCERPTS
CITES BACKGROUND & METHODS
HIGHLY INFLUENCED

Scene Graph Contextualization in Visual Commonsense Reasoning

VIEW 9 EXCERPTS
CITES BACKGROUND, METHODS & RESULTS
HIGHLY INFLUENCED

VL-BERT: Pre-training of Generic Visual-Linguistic Representations

VIEW 7 EXCERPTS
CITES METHODS & BACKGROUND
HIGHLY INFLUENCED

References

Publications referenced by this paper.
SHOWING 1-10 OF 92 REFERENCES

Hadamard Product for Low-rank Bilinear Pooling

  • Jin-Hwa Kim, Kyoung Woon On, +3 authors Byoung-Tak Zhang
  • In The 5th International Conference on Learning Representations,
  • 2017
VIEW 5 EXCERPTS
HIGHLY INFLUENTIAL

Long Short-Term Memory

VIEW 6 EXCERPTS
HIGHLY INFLUENTIAL

Deep contextualized word representations

VIEW 6 EXCERPTS
HIGHLY INFLUENTIAL

MUTAN: Multimodal Tucker Fusion for Visual Question Answering

VIEW 4 EXCERPTS
HIGHLY INFLUENTIAL

Enhanced LSTM for Natural Language Inference

VIEW 6 EXCERPTS
HIGHLY INFLUENTIAL

Inferring the Why in Images

VIEW 4 EXCERPTS
HIGHLY INFLUENTIAL