Describing Unseen Videos via Multi-Modal Cooperative Dialog Agents

  title={Describing Unseen Videos via Multi-Modal Cooperative Dialog Agents},
  author={Ye Zhu and Yu Wu and Yi Yang and Yan Yan},
With the arising concerns for the AI systems provided with direct access to abundant sensitive information, researchers seek to develop more reliable AI with implicit information sources. To this end, in this paper, we introduce a new task called video description via two multi-modal cooperative dialog agents, whose ultimate goal is for one conversational agent to describe an unseen video based on the dialog and two static frames. Specifically, one of the intelligent agents - Q-BOT - is given… 

