Corpus ID: 52305408

MTLE: A Multitask Learning Encoder of Visual Feature Representations for Video and Movie Description

  title={MTLE: A Multitask Learning Encoder of Visual Feature Representations for Video and Movie Description},
  author={Oliver Nina and W. Garcia and Scott Clouse and A. Yilmaz},
  • Oliver Nina, W. Garcia, +1 author A. Yilmaz
  • Published 2018
  • Computer Science, Mathematics
  • ArXiv
  • Learning visual feature representations for video analysis is a daunting task that requires a large amount of training samples and a proper generalization framework. Many of the current state of the art methods for video captioning and movie description rely on simple encoding mechanisms through recurrent neural networks to encode temporal visual information extracted from video data. In this paper, we introduce a novel multitask encoder-decoder framework for automatic semantic description and… CONTINUE READING
    2 Citations
    SibNet: Sibling Convolutional Encoder for Video Captioning
    • 22
    • PDF
    Adversarial Video Captioning


    Describing Videos by Exploiting Temporal Structure
    • 750
    • Highly Influential
    • PDF
    Sequence to Sequence -- Video to Text
    • 893
    • PDF
    Hierarchical Recurrent Neural Encoder for Video Representation with Application to Captioning
    • 286
    • PDF
    The Long-Short Story of Movie Description
    • 73
    • Highly Influential
    • PDF
    MSR-VTT: A Large Video Description Dataset for Bridging Video and Language
    • 418
    • Highly Influential
    • PDF
    Learning explicit video attributes from mid-level representation for video captioning
    • 14
    Title Generation for User Generated Videos
    • 39
    • PDF
    Video Captioning with Transferred Semantic Attributes
    • 184
    • PDF