GuessWhat?! Visual Object Discovery through Multi-modal Dialogue

  title={GuessWhat?! Visual Object Discovery through Multi-modal Dialogue},
  author={Harm de Vries and Florian Strub and A. P. Sarath Chandar and Olivier Pietquin and Hugo Larochelle and Aaron C. Courville},
  journal={2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
We introduce GuessWhat?!, a two-player guessing game as a testbed for research on the interplay of computer vision and dialogue systems. The goal of the game is to locate an unknown object in a rich image scene by asking a sequence of questions. Higher-level image understanding, like spatial reasoning and language grounding, is required to solve the proposed task. Our key contribution is the collection of a large-scale dataset consisting of 150K human-played games with a total of 800K visual… CONTINUE READING
Highly Influential
This paper has highly influenced 12 other papers. REVIEW HIGHLY INFLUENTIAL CITATIONS
Highly Cited
This paper has 78 citations. REVIEW CITATIONS
58 Citations
45 References
Similar Papers


Publications citing this paper.
Showing 1-10 of 58 extracted citations

79 Citations

Citations per Year
Semantic Scholar estimates that this publication has 79 citations based on the available data.

See our FAQ for additional information.


Publications referenced by this paper.
Showing 1-10 of 45 references


  • J. Weston, A. Bordes, S. Chopra, A. Rush
  • van Merriënboer, A. Joulin, and T. Mikolov…
  • 2016
3 Excerpts

Similar Papers

Loading similar papers…