Iconary: A Pictionary-Based Game for Testing Multimodal Communication with Drawings and Text

@article{Clark2021IconaryAP,
  title={Iconary: A Pictionary-Based Game for Testing Multimodal Communication with Drawings and Text},
  author={Christopher Clark and Jordi Salvador and Dustin Schwenk and Derrick Bonafilia and Mark Yatskar and Eric Kolve and Alvaro Herrasti and Jonghyun Choi and Sachin Mehta and Sam Skjonsberg and Carissa Schoenick and Aaron Sarnat and Hannaneh Hajishirzi and Aniruddha Kembhavi and Oren Etzioni and Ali Farhadi},
  journal={ArXiv},
  year={2021},
  volume={abs/2112.00800}
}
Communicating with humans is challenging for AIs because it requires a shared understanding of the world, complex semantics (e.g., metaphors or analogies), and at times multi-modal gestures (e.g., pointing with a finger, or an arrow in a diagram). We investigate these challenges in the context of Iconary, a collaborative game of drawing and guessing based on Pictionary, that poses a novel challenge for the research community. In Iconary, a Guesser tries to identify a phrase that a Drawer is… 

References

SHOWING 1-10 OF 57 REFERENCES
Game of Sketches: Deep Recurrent Models of Pictionary-style Word Guessing
TLDR
This work introduces the first computational model aimed at Pictionary, the popular word-guessing social game, and proposes a deep neural model which generates guess-words in response to temporally evolving human-drawn sketches.
Pictionary-Style Word Guessing on Hand-Drawn Object Sketches: Dataset, Analysis and Deep Network Models
TLDR
This work introduces the first computational model aimed at Pictionary, the popular word-guessing social game, and proposes a deep neural model which generates guess-words in response to temporally evolving human-drawn object sketches, to mimic Pictionary-style guessing.
Metaphor: A Computational Perspective
TLDR
This book offers a comprehensive approach to the computational treatment of metaphor and its figurative brethren-including simile, analogy, and conceptual blending-that does not shy away from their important cognitive and philosophical dimensions.
Black Holes and White Rabbits: Metaphor Identification with Visual Features
TLDR
This paper presents the first metaphor identification method that simultaneously draws knowledge from linguistic and visual data, as well as being competitive with the best-performing metaphor identification methods, that rely on hand-crafted knowledge about domains and perception.
X-LXMERT: Paint, Caption and Answer Questions with Multi-Modal Transformers
TLDR
X-LXMERT is introduced, an extension to LXMERT with training refinements including: discretizing visual representations, using uniform masking with a large range of masking ratios and aligning the right pre-training datasets to the right objectives which enables it to paint.
Generating Natural Questions About an Image
TLDR
This paper introduces the novel task of Visual Question Generation, where the system is tasked with asking a natural and engaging question when shown an image, and provides three datasets which cover a variety of images from object-centric to event-centric.
Cooperation and Codenames: Understanding Natural Language Processing via Codenames
TLDR
A number of different natural language processing techniques are evaluated in the context of the Codenames AI framework, attempting to determine how different approaches perform.
Visual Dialog
TLDR
A retrieval-based evaluation protocol for Visual Dialog where the AI agent is asked to sort a set of candidate answers and evaluated on metrics such as mean-reciprocal-rank of human response, and a family of neural encoder-decoder models, which outperform a number of sophisticated baselines.
Diagram Understanding in Geometry Questions
TLDR
This paper presents a method for diagram understanding that identifies visual elements in a diagram while maximizing agreement between textual and visual data, and shows that the method's objective function is submodular.
VQA: Visual Question Answering
We propose the task of free-form and open-ended Visual Question Answering (VQA). Given an image and a natural language question about the image, the task is to provide an accurate natural language
...
1
2
3
4
5
...