Jointly Localizing and Describing Events for Dense Video Captioning

@article{Li2018JointlyLA,
  title={Jointly Localizing and Describing Events for Dense Video Captioning},
  author={Yehao Li and Ting Yao and Yingwei Pan and Hongyang Chao and Tao Mei},
  journal={2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2018},
  pages={7492-7500}
}
Automatically describing a video with natural language is regarded as a fundamental challenge in computer vision. The problem nevertheless is not trivial especially when a video contains multiple events to be worthy of mention, which often happens in real videos. A valid question is how to temporally localize and then describe events, which is known as "dense video captioning." In this paper, we present a novel framework for dense video captioning that unifies the localization of temporal event… CONTINUE READING

Similar Papers

Citations

Publications citing this paper.
SHOWING 1-10 OF 18 CITATIONS

Streamlined Dense Video Captioning

VIEW 12 EXCERPTS
CITES METHODS & BACKGROUND
HIGHLY INFLUENCED

Multimodal Transformer Networks for End-to-End Video-Grounded Dialogue Systems

Hung Le, Doyen Sahoo, Nancy F. Chen, Steven C.H. Hoi
  • ACL 2019
  • 2019
VIEW 2 EXCERPTS

Reconstruct and Represent Video Contents for Captioning via Reinforcement Learning

Wei Zhang, Bairui Wang, Lin Ma, Wei Liu
  • IEEE transactions on pattern analysis and machine intelligence
  • 2019
VIEW 1 EXCERPT
CITES BACKGROUND

References

Publications referenced by this paper.
SHOWING 1-10 OF 39 REFERENCES

Dense-Captioning Events in Videos

  • 2017 IEEE International Conference on Computer Vision (ICCV)
  • 2017
VIEW 16 EXCERPTS
HIGHLY INFLUENTIAL

Video Paragraph Captioning Using Hierarchical Recurrent Neural Networks

  • 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2015
VIEW 8 EXCERPTS
HIGHLY INFLUENTIAL

TURN TAP: Temporal Unit Regression Network for Temporal Action Proposals

  • 2017 IEEE International Conference on Computer Vision (ICCV)
  • 2017
VIEW 6 EXCERPTS
HIGHLY INFLUENTIAL

Temporal Action Detection with Structured Segment Networks

  • 2017 IEEE International Conference on Computer Vision (ICCV)
  • 2017
VIEW 4 EXCERPTS
HIGHLY INFLUENTIAL

Sequence to Sequence -- Video to Text

  • 2015 IEEE International Conference on Computer Vision (ICCV)
  • 2015
VIEW 10 EXCERPTS
HIGHLY INFLUENTIAL

Learning Spatiotemporal Features with 3D Convolutional Networks

  • 2015 IEEE International Conference on Computer Vision (ICCV)
  • 2014
VIEW 6 EXCERPTS
HIGHLY INFLUENTIAL

Learning Spatio-Temporal Representation with Pseudo-3D Residual Networks

  • 2017 IEEE International Conference on Computer Vision (ICCV)
  • 2017
VIEW 2 EXCERPTS
HIGHLY INFLUENTIAL

Describing Videos by Exploiting Temporal Structure

  • 2015 IEEE International Conference on Computer Vision (ICCV)
  • 2015
VIEW 5 EXCERPTS
HIGHLY INFLUENTIAL