Imagine This! Scripts to Compositions to Videos

@inproceedings{Gupta2018ImagineTS,
  title={Imagine This! Scripts to Compositions to Videos},
  author={Tanmay Gupta and Dustin Schwenk and Ali Farhadi and Derek Hoiem and Aniruddha Kembhavi},
  booktitle={ECCV},
  year={2018}
}
Imagining a scene described in natural language with realistic layout and appearance of entities is the ultimate test of spatial, visual, and semantic world knowledge. Towards this goal, we present the Composition, Retrieval and Fusion Network (Craft), a model capable of learning this knowledge from video-caption data and applying it while generating videos from novel captions. Craft explicitly predicts a temporal-layout of mentioned entities (characters and objects), retrieves spatio-temporal… CONTINUE READING
132
Twitter Mentions

Figures, Tables, and Topics from this paper.

References

Publications referenced by this paper.
SHOWING 1-10 OF 38 REFERENCES

Inferring Semantic Layout for Hierarchical Text-to-Image Synthesis

  • 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
  • 2018
VIEW 1 EXCERPT

Attentive Semantic Video Generation Using Captions

  • 2017 IEEE International Conference on Computer Vision (ICCV)
  • 2017
VIEW 1 EXCERPT

Learning Robust Visual-Semantic Embeddings

  • 2017 IEEE International Conference on Computer Vision (ICCV)
  • 2017
VIEW 1 EXCERPT

Photographic Image Synthesis with Cascaded Refinement Networks

  • 2017 IEEE International Conference on Computer Vision (ICCV)
  • 2017
VIEW 1 EXCERPT