Imagine This! Scripts to Compositions to Videos

  title={Imagine This! Scripts to Compositions to Videos},
  author={Tanmay Gupta and Dustin Schwenk and Ali Farhadi and Derek Hoiem and Aniruddha Kembhavi},
Imagining a scene described in natural language with realistic layout and appearance of entities is the ultimate test of spatial, visual, and semantic world knowledge. Towards this goal, we present the Composition, Retrieval and Fusion Network (Craft), a model capable of learning this knowledge from video-caption data and applying it while generating videos from novel captions. Craft explicitly predicts a temporal-layout of mentioned entities (characters and objects), retrieves spatio-temporal… CONTINUE READING


Publications citing this paper.


Publications referenced by this paper.
Showing 1-10 of 38 references

Generating interpretable images with controllable structure

  • S. Reed, A. van den Oord, N. Kalchbrenner, V. Bapst, M. Botvinick, N. de Freitas
  • 2017
1 Excerpt

Similar Papers

Loading similar papers…