From Recognition to Cognition: Visual Commonsense Reasoning

@article{Zellers2019FromRT,
  title={From Recognition to Cognition: Visual Commonsense Reasoning},
  author={Rowan Zellers and Yonatan Bisk and Ali Farhadi and Yejin Choi},
  journal={2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2019},
  pages={6713-6724}
}
  • Rowan Zellers, Yonatan Bisk, +1 author Yejin Choi
  • Published 2019
  • Computer Science
  • 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
  • Visual understanding goes well beyond object recognition. [...] Key Method Next, we introduce a new dataset, VCR, consisting of 290k multiple choice QA problems derived from 110k movie scenes. The key recipe for generating non-trivial and high-quality problems at scale is Adversarial Matching, a new approach to transform rich annotations into multiple choice questions with minimal bias. Experimental results show that while humans find VCR easy (over 90% accuracy), state-of-the-art vision models struggle (~45…Expand Abstract
    171 Citations

    Paper Mentions

    Enforcing Reasoning in Visual Commonsense Reasoning
    • 1
    • Highly Influenced
    • PDF
    Neuro-Symbolic Visual Reasoning: Disentangling "Visual" from "Reasoning"
    • 5
    • PDF
    Project on Visual Commonsense Reasoning Anonymous ACL submission
    • 2019
    • PDF
    Computer vision beyond the visible : image understanding through language
    • 2
    • Highly Influenced
    A Simple Baseline for Visual Commonsense Reasoning
    • Highly Influenced
    • PDF
    SPARE3D: A Dataset for SPAtial REasoning on Three-View Line Drawings
    • 1
    • PDF
    Connective Cognition Network for Directional Visual Commonsense Reasoning
    • 4
    • Highly Influenced
    • PDF
    KVL-BERT: Knowledge Enhanced Visual-and-Linguistic BERT for Visual Commonsense Reasoning
    • Highly Influenced
    • PDF
    Edited Media Understanding: Reasoning About Implications of Manipulated Images
    • 1
    • PDF

    References

    SHOWING 1-10 OF 98 REFERENCES
    Inferring the Why in Images
    • 30
    • Highly Influential
    • PDF
    Visual7W: Grounded Question Answering in Images
    • 459
    • PDF
    Revisiting Visual Question Answering Baselines
    • 196
    • PDF
    Learning to Act Properly: Predicting and Explaining Affordances from Images
    • 25
    • PDF
    Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations
    • 1,683
    • PDF
    TGIF-QA: Toward Spatio-Temporal Reasoning in Visual Question Answering
    • 146
    • PDF
    Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering
    • 709
    • PDF
    Overcoming Language Priors in Visual Question Answering with Adversarial Regularization
    • 72
    • PDF
    VQA: Visual Question Answering
    • 2,059
    • PDF
    SWAG: A Large-Scale Adversarial Dataset for Grounded Commonsense Inference
    • 274
    • PDF