Abstract Visual Reasoning with Tangram Shapes
@inproceedings{Ji2022AbstractVR, title={Abstract Visual Reasoning with Tangram Shapes}, author={Anya Ji and Noriyuki Kojima and Noah Rush and Alane Suhr and Wai Keen Vong and Robert D. Hawkins and Yoav Artzi}, booktitle={Conference on Empirical Methods in Natural Language Processing}, year={2022} }
We introduce KiloGram, a resource for studying abstract visual reasoning in humans and machines. Drawing on the history of tangram puzzles as stimuli in cognitive science, we build a richly annotated dataset that, with >1k distinct stimuli, is orders of magnitude larger and more diverse than prior resources. It is both visually and linguistically richer, moving beyond whole shape descriptions to include segmentation maps and part labels. We use this resource to evaluate the abstract visual…
Figures and Tables from this paper
One Citation
Do language models have coherent mental models of everyday things?
- Computer ScienceArXiv
- 2022
A simple extension to pre-trained language models like GPT-3 and Macaw is proposed where a constraint satisfaction layer is applied on top of raw predictions from LMs to produce more consistent and accurate parts mental models of everyday things.
References
SHOWING 1-10 OF 58 REFERENCES
Natural Reference to Objects in a Visual Domain
- PhilosophyINLG
- 2010
This paper constructs a study designed to elicit naturalistic referring expressions to relatively complex objects, and finds aspects of reference that have not been accounted for in work on Referring Expression Generation (REG).
A Corpus for Reasoning about Natural Language Grounded in Photographs
- Computer ScienceACL
- 2019
This work introduces a new dataset for joint reasoning about natural language and images, with a focus on semantic diversity, compositionality, and visual reasoning challenges, and Evaluation using state-of-the-art visual reasoning methods shows the data presents a strong challenge.
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks
- Computer ScienceNeurIPS
- 2019
We present ViLBERT (short for Vision-and-Language BERT), a model for learning task-agnostic joint representations of image content and natural language. We extend the popular BERT architecture to a…
Common object representations for visual recognition and production
- PsychologyCogSci
- 2015
It is found that repeatedly sketched objects were better recognized after training, while sketches of unpracticed but similar objects worsened, showing that visual production can reshape the representational space for objects: by differentiating trained objects and merging other nearby objects in the space.
A Corpus of Natural Language for Visual Reasoning
- Computer ScienceACL
- 2017
A method of crowdsourcing linguistically-diverse data, and an analysis of the data demonstrates a broad set of linguistic phenomena, requiring visual and set-theoretic reasoning.
Winoground: Probing Vision and Language Models for Visio-Linguistic Compositionality
- Computer Science2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- 2022
A novel task and dataset for evaluating the ability of vision and language models to conduct visio-linguistic compositional reasoning, which is called Winoground, and it is found that, surprisingly, none of them do much better than chance.
Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations
- Computer ScienceInternational Journal of Computer Vision
- 2016
The Visual Genome dataset is presented, which contains over 108K images where each image has an average of $$35$$35 objects, $$26$$26 attributes, and $$21$$21 pairwise relationships between objects, and represents the densest and largest dataset of image descriptions, objects, attributes, relationships, and question answer pairs.
Learning to communicate about shared procedural abstractions
- Computer ScienceArXiv
- 2021
The results shed light on the inductive biases that enable intelligent agents to coordinate upon shared procedural abstractions and propose that concepts may be represented by structured programs written in a domain-specific language (DSL).
Generation and Comprehension of Unambiguous Object Descriptions
- Computer Science2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- 2016
This work proposes a method that can generate an unambiguous description of a specific object or region in an image and which can also comprehend or interpret such an expression to infer which object is being described, and shows that this method outperforms previous methods that generate descriptions of objects without taking into account other potentially ambiguous objects in the scene.
MultiPic: A standardized set of 750 drawings with norms for six European languages
- LinguisticsQuarterly journal of experimental psychology
- 2018
A new set of 750 colored pictures of concrete concepts, MultiPic, constitutes a new valuable tool for cognitive scientists investigating language, visual perception, memory and/or attention in monolingual or multilingual populations.