Beyond task success: A closer look at jointly learning to see, ask, and GuessWhat
@inproceedings{Shekhar2019BeyondTS, title={Beyond task success: A closer look at jointly learning to see, ask, and GuessWhat}, author={Ravi Shekhar and Aashish Venkatesh and Tim Baumg{\"a}rtner and Elia Bruni and Barbara Plank and R. Bernardi and R. Fern{\'a}ndez}, booktitle={NAACL-HLT}, year={2019} }
We propose a grounded dialogue state encoder which addresses a foundational issue on how to integrate visual grounding with dialogue system components. As a test-bed, we focus on the GuessWhat?! game, a two-player game where the goal is to identify an object in a complex visual scene by asking a sequence of yes/no questions. Our visually-grounded encoder leverages synergies between guessing and asking questions, as it is trained jointly using multi-task learning. We further enrich our model via… CONTINUE READING
Supplemental Code
Github Repo
Via Papers with Code
[NAACL 2019] Pytorch Code for "Beyond task success: A closer look at jointly learning to see, ask, and GuessWhat"
Figures, Tables, and Topics from this paper
18 Citations
Imagining Grounded Conceptual Representations from Perceptual Information in Situated Guessing Games
- Computer Science
- COLING
- 2020
- Highly Influenced
- PDF
An Empirical Study on the Generalization Power of Neural Representations Learned via Visual Guessing Games
- Computer Science
- ArXiv
- 2021
- Highly Influenced
- PDF
Answer-Driven Visual State Estimator for Goal-Oriented Visual Dialogue
- Computer Science
- ACM Multimedia
- 2020
- Highly Influenced
- PDF
Jointly Learning to See, Ask, Decide when to Stop, and then GuessWhat
- Computer Science
- CLiC-it
- 2019
- 2
- PDF
They are not all alike: answering different spatial questions requires different grounding strategies
- Computer Science
- SPLU
- 2020
- Highly Influenced
- PDF
Which Turn do Neural Models Exploit the Most to Solve GuessWhat? Diving into the Dialogue History Encoding in Transformers and LSTMs
- Computer Science
- NL4AI@AI*IA
- 2020
- PDF
Large-scale Pretraining for Visual Dialog: A Simple State-of-the-Art Baseline
- Computer Science, Mathematics
- ECCV
- 2020
- 14
- PDF
References
SHOWING 1-10 OF 38 REFERENCES
GuessWhat?! Visual Object Discovery through Multi-modal Dialogue
- Computer Science
- 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- 2017
- 233
- Highly Influential
- PDF
Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning
- Computer Science
- 2017 IEEE International Conference on Computer Vision (ICCV)
- 2017
- 287
- PDF
Interactive Reinforcement Learning for Object Grounding via Self-Talking
- Computer Science
- ArXiv
- 2017
- 7
- PDF
Visual Dialog
- Computer Science
- 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- 2017
- 339
- Highly Influential
- PDF
Asking the Difficult Questions: Goal-Oriented Visual Question Generation via Intermediate Rewards
- Computer Science
- ECCV
- 2018
- 24
- PDF
End-to-end optimization of goal-driven and visually grounded dialogue systems
- Computer Science
- IJCAI
- 2017
- 98
- Highly Influential
- PDF
Towards End-to-End Learning for Dialog State Tracking and Management using Deep Reinforcement Learning
- Computer Science
- SIGDIAL Conference
- 2016
- 186
- PDF