Video Paragraph Captioning Using Hierarchical Recurrent Neural Networks

@article{Yu2016VideoPC,
  title={Video Paragraph Captioning Using Hierarchical Recurrent Neural Networks},
  author={Haonan Yu and J. Wang and Zhiheng Huang and Y. Yang and W. Xu},
  journal={2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2016},
  pages={4584-4593}
}
  • Haonan Yu, J. Wang, +2 authors W. Xu
  • Published 2016
  • Computer Science
  • 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
  • We present an approach that exploits hierarchical Recurrent Neural Networks (RNNs) to tackle the video captioning problem, i.e., generating one or multiple sentences to describe a realistic video. [...] Key Method It exploits both temporal-and spatial-attention mechanisms to selectively focus on visual elements during generation.Expand Abstract

    Figures, Tables, and Topics from this paper.

    Hierarchical Boundary-Aware Neural Encoder for Video Captioning
    • 115
    • PDF
    Move Forward and Tell: A Progressive Generator of Video Descriptions
    • 17
    • Highly Influenced
    • PDF
    A Hierarchical Approach for Generating Descriptive Image Paragraphs
    • 148
    • PDF
    Memory-Attended Recurrent Network for Video Captioning
    • 21
    • PDF
    Video Captioning with Transferred Semantic Attributes
    • 169
    • PDF
    Video Captioning via Hierarchical Reinforcement Learning
    • 90
    • Highly Influenced
    • PDF
    Learning Multimodal Attention LSTM Networks for Video Captioning
    • 55
    • Highly Influenced
    • PDF

    References

    Publications referenced by this paper.
    SHOWING 1-10 OF 69 REFERENCES
    A Hierarchical Neural Autoencoder for Paragraphs and Documents
    • 458
    • Highly Influential
    • PDF
    Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN)
    • 812
    • PDF
    Show and tell: A neural image caption generator
    • 3,419
    • PDF
    Describing Videos by Exploiting Temporal Structure
    • 714
    • Highly Influential
    • PDF
    Sequence to Sequence -- Video to Text
    • 824
    • Highly Influential
    • PDF
    Learning a Recurrent Visual Representation for Image Caption Generation
    • 151
    • PDF
    Translating Videos to Natural Language Using Deep Recurrent Neural Networks
    • 537
    • Highly Influential
    Jointly Modeling Embedding and Translation to Bridge Video and Language
    • 339
    • Highly Influential
    • PDF
    A Neural Conversational Model
    • 1,160
    • PDF