Corpus ID: 235342524

iReason: Multimodal Commonsense Reasoning using Videos and Natural Language with Interpretability

  title={iReason: Multimodal Commonsense Reasoning using Videos and Natural Language with Interpretability},
  author={Andrew Wang and Aman Chadha},
Causality knowledge is vital to building robust AI systems. Deep learning models often perform poorly on tasks that require causal reasoning, which is often derived using some form of commonsense knowledge not immediately available in the input but implicitly inferred by humans. Prior work has unraveled spurious observational biases that models fall prey to in the absence of causality. While language representation models preserve contextual knowledge within learned embeddings, they do not… Expand

Figures and Tables from this paper


Explain Yourself! Leveraging Language Models for Commonsense Reasoning
This work collects human explanations for commonsense reasoning in the form of natural language sequences and highlighted annotations in a new dataset called Common Sense Explanations to train language models to automatically generate explanations that can be used during training and inference in a novel Commonsense Auto-Generated Explanation framework. Expand
Learning Common Sense through Visual Abstraction
The use of human-generated abstract scenes made from clipart for learning common sense is explored and it is shown that the commonsense knowledge the authors learn is complementary to what can be learnt from sources of text. Expand
Learning Contextual Causality from Time-consecutive Images
Analysis shows that the contextual property of causal relations indeed exists, taking which into consideration might be crucial if the authors want to use the causality knowledge in real applications, and the visual signal could serve as a good resource for learning such contextual causality. Expand
iPerceive: Applying Common-Sense Reasoning to Multi-Modal Dense Video Captioning and Video Question Answering
iPerceive is proposed, a framework capable of understanding the "why" between events in a video by building a common-sense knowledge base using contextual cues to infer causal relationships between objects in the video. Expand
Commonsense Reasoning for Natural Language Processing
This tutorial organizes this tutorial to provide researchers with the critical foundations and recent advances in commonsense representation and reasoning, in the hopes of casting a brighter light on this promising area of future research. Expand
Don't just listen, use your imagination: Leveraging visual common sense for non-visual tasks
  • Xiaoyu Lin, Devi Parikh
  • Computer Science
  • 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2015
This paper proposes to “imagine” the scene behind the text, and leverage visual cues from the “imagined” scenes in addition to textual cues while answering these questions, and outperforms a strong text-only baseline on these tasks. Expand
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks
We present ViLBERT (short for Vision-and-Language BERT), a model for learning task-agnostic joint representations of image content and natural language. We extend the popular BERT architecture to aExpand
Visual Commonsense R-CNN
A novel unsupervised feature representation learning method, Visual Commonsense Region-based Convolutional Neural Network (VC R-CNN), is presented to serve as an improved visual region encoder for high-level tasks such as captioning and VQA, and observes consistent performance boosts across them. Expand
Commonsense Reasoning for Natural Language Understanding: A Survey of Benchmarks, Resources, and Approaches
This paper aims to provide an overview of existing tasks and benchmarks, knowledge resources, and learning and inference approaches toward commonsense reasoning for natural language understanding to support a better understanding of the state of the art, its limitations, and future challenges. Expand
The “Something Something” Video Database for Learning and Evaluating Visual Common Sense
This work describes the ongoing collection of the “something-something” database of video prediction tasks whose solutions require a common sense understanding of the depicted situation, and describes the challenges in crowd-sourcing this data at scale. Expand