Beyond Language: Learning Commonsense from Images for Reasoning

  title={Beyond Language: Learning Commonsense from Images for Reasoning},
  author={Wanqing Cui and Yanyan Lan and Liang Pang and Jiafeng Guo and Xueqi Cheng},
This paper proposes a novel approach to learn commonsense from images, instead of limited raw texts or costly constructed knowledge bases, for the commonsense reasoning problem in NLP. Our motivation comes from the fact that an image is worth a thousand words, where richer scene information could be leveraged to help distill the commonsense knowledge, which is often hidden in languages. Our approach, namely Loire, consists of two stages. In the first stage, a bi-modal sequence-to-sequence… 

Figures and Tables from this paper

Things not Written in Text: Exploring Spatial Commonsense from Visual Signals
Spatial commonsense, the knowledge about spatial position and relationship between objects (like the relative size of a lion and a girl, and the position of a boy relative to a bicycle when cycling),


Teaching Pretrained Models with Commonsense Reasoning: A Preliminary KB-Based Approach
This work proposes a simple yet effective method to teach pretrained models with commonsense reasoning by leveraging the structured knowledge in ConceptNet, the largest commonsense knowledge base (KB).
Align, Mask and Select: A Simple Method for Incorporating Commonsense Knowledge into Language Representation Models
Experimental results demonstrate that pre-training models using the proposed approach followed by fine-tuning achieve significant improvements over previous state-of-the-art models on two commonsense-related benchmarks, including CommonsenseQA and Winograd Schema Challenge.
Explain Yourself! Leveraging Language Models for Commonsense Reasoning
This work collects human explanations for commonsense reasoning in the form of natural language sequences and highlighted annotations in a new dataset called Common Sense Explanations to train language models to automatically generate explanations that can be used during training and inference in a novel Commonsense Auto-Generated Explanation framework.
Incorporating Relation Knowledge into Commonsense Reading Comprehension with Multi-task Learning
This paper designs two auxiliary relation-aware tasks to predict if there exists any commonsense relation and what is the relation type be-tween two words, in order to better model the interactions between document and candidate answer option.
KagNet: Knowledge-Aware Graph Networks for Commonsense Reasoning
This paper proposes a textual inference framework for answering commonsense questions, which effectively utilizes external, structured commonsense knowledge graphs to perform explainable inferences.
Cracking the Contextual Commonsense Code: Understanding Commonsense Reasoning Aptitude of Deep Contextual Representations
This work probes and challenges several aspects of BERT's commonsense representation abilities, and develops a method of fine-tuning knowledge graphs embeddings alongside BERT and shows the continued importance of explicit knowledge graphs.
Improving Question Answering by Commonsense-Based Pre-Training
Results show that incorporating commonsense-based function improves the baseline on three question answering tasks that require commonsense reasoning and leverages useful evidence from an external commonsense knowledge base, which is missing in existing neural network models.
CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge
This work presents CommonsenseQA: a challenging new dataset for commonsense question answering, which extracts from ConceptNet multiple target concepts that have the same semantic relation to a single source concept.
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks
We present ViLBERT (short for Vision-and-Language BERT), a model for learning task-agnostic joint representations of image content and natural language. We extend the popular BERT architecture to a
G-DAug: Generative Data Augmentation for Commonsense Reasoning
This work proposes a novel generative data augmentation technique, G-DAUGˆC, that aims to achieve more accurate and robust learning in a low-resource setting and produces a diverse set of fluent training examples, demonstrating that its selection and training approaches are important for performance.