Iconary: A Pictionary-Based Game for Testing Multimodal Communication with Drawings and Text

  title={Iconary: A Pictionary-Based Game for Testing Multimodal Communication with Drawings and Text},
  author={Christopher Clark and Jordi Salvador and Dustin Schwenk and Derrick Bonafilia and Mark Yatskar and Eric Kolve and Alvaro Herrasti and Jonghyun Choi and Sachin Mehta and Sam Skjonsberg and Carissa Schoenick and Aaron Sarnat and Hannaneh Hajishirzi and Aniruddha Kembhavi and Oren Etzioni and Ali Farhadi},
Communicating with humans is challenging for AIs because it requires a shared understanding of the world, complex semantics (e.g., metaphors or analogies), and at times multi-modal gestures (e.g., pointing with a finger, or an arrow in a diagram). We investigate these challenges in the context of Iconary, a collaborative game of drawing and guessing based on Pictionary, that poses a novel challenge for the research community. In Iconary, a Guesser tries to identify a phrase that a Drawer is… 

DrawMon: A Distributed System for Detection of Atypical Sketch Content in Concurrent Pictionary Games

This work introduces DrawMon, a novel distributed framework for automatic detection of atypical sketch content in concurrently occurring Pictionary game sessions, and builds specialized online interfaces to collect game session data and annotate atypicals sketch content, resulting in AtyPict, the first ever atypICAL sketch content dataset.



Game of Sketches: Deep Recurrent Models of Pictionary-style Word Guessing

A deep neural model is proposed which generates guess-words in response to temporally evolving human-drawn sketches to mimic Pictionary-style guessing and makes human-like mistakes while guessing, thus amplifying the human mimicry factor.

Pictionary-Style Word Guessing on Hand-Drawn Object Sketches: Dataset, Analysis and Deep Network Models

This work introduces the first computational model aimed at Pictionary, the popular word-guessing social game, and proposes a deep neural model which generates guess-words in response to temporally evolving human-drawn object sketches, to mimic Pictionary-style guessing.

X-LXMERT: Paint, Caption and Answer Questions with Multi-Modal Transformers

X-LXMERT is introduced, an extension to LXMERT with training refinements including: discretizing visual representations, using uniform masking with a large range of masking ratios and aligning the right pre-training datasets to the right objectives which enables it to paint.

Generating Natural Questions About an Image

This paper introduces the novel task of Visual Question Generation, where the system is tasked with asking a natural and engaging question when shown an image, and provides three datasets which cover a variety of images from object-centric to event-centric.

Cooperation and Codenames: Understanding Natural Language Processing via Codenames

A number of different natural language processing techniques are evaluated in the context of the Codenames AI framework, attempting to determine how different approaches perform.

Diagram Understanding in Geometry Questions

This paper presents a method for diagram understanding that identifies visual elements in a diagram while maximizing agreement between textual and visual data, and shows that the method's objective function is submodular.

VQA: Visual Question Answering

We propose the task of free-form and open-ended Visual Question Answering (VQA). Given an image and a natural language question about the image, the task is to provide an accurate natural language

A Corpus for Reasoning about Natural Language Grounded in Photographs

This work introduces a new dataset for joint reasoning about natural language and images, with a focus on semantic diversity, compositionality, and visual reasoning challenges, and Evaluation using state-of-the-art visual reasoning methods shows the data presents a strong challenge.

Situation Recognition: Visual Semantic Role Labeling for Image Understanding

This paper introduces situation recognition, the problem of producing a concise summary of the situation an image depicts including: (1) the main activity (e.g., clipping), (2) the participating

ChatPainter: Improving Text to Image Generation using Dialogue

It is shown that adding a dialogue that further describes the scene leads to significant improvement in the inception score and in the quality of generated images on the MS COCO dataset.