Visual Dialog

@article{Das2017VisualD,
  title={Visual Dialog},
  author={Abhishek Das and Satwik Kottur and Khushi Gupta and Avi Singh and Deshraj Yadav and Jos{\'e} M. F. Moura and Devi Parikh and Dhruv Batra},
  journal={2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2017},
  pages={1080-1089}
}
We introduce the task of Visual Dialog, which requires an AI agent to hold a meaningful dialog with humans in natural, conversational language about visual content. Specifically, given an image, a dialog history, and a question about the image, the agent has to ground the question in image, infer context from history, and answer the question accurately. Visual Dialog is disentangled enough from a specific downstream task so as to serve as a general test of machine intelligence, while being… CONTINUE READING
90
Twitter Mentions

Citations

Publications citing this paper.
SHOWING 1-10 OF 13 CITATIONS

The PhotoBook Dataset: Building Common Ground through Visually-Grounded Dialogue

VIEW 4 EXCERPTS
CITES BACKGROUND & RESULTS
HIGHLY INFLUENCED

Visual Question Answering and Beyond

VIEW 2 EXCERPTS
CITES BACKGROUND

Exploring Semantic Relationships for Image Captioning without Parallel Data

VIEW 1 EXCERPT
CITES BACKGROUND

Long-Form Video Question Answering via Dynamic Hierarchical Reinforced Networks

VIEW 1 EXCERPT
CITES BACKGROUND

References

Publications referenced by this paper.
SHOWING 1-10 OF 71 REFERENCES

Learning End-to-End Goal-Oriented Dialog

VIEW 7 EXCERPTS
HIGHLY INFLUENTIAL

VQA: Visual Question Answering

VIEW 10 EXCERPTS

Yin and Yang: Balancing and Answering Binary Visual Questions

VIEW 6 EXCERPTS

P

  • T.-Y. Lin, M. Maire, +3 authors D. Ramanan
  • Dollár, and C. L. Zitnick. Microsoft COCO: Common Objects in Context. In ECCV
  • 2014
VIEW 12 EXCERPTS
HIGHLY INFLUENTIAL

20 Figure 18: Selected examples of attention over history facts from our Memory Network encoder

  • H. Gao, J. Mao, +3 authors W. Xu
  • The intensity of color in each row indicates the strength of attention placed on that round by the model. Are You Talking to a Machine? Dataset and Methods for Multilingual Image Question Answering. In NIPS
  • 2015
VIEW 4 EXCERPTS
HIGHLY INFLUENTIAL

Knowledge Guided Disambiguation for Large-Scale Scene Classification With Multi-Resolution CNNs.

VIEW 1 EXCERPT