Translating Videos to Natural Language Using Deep Recurrent Neural Networks

@inproceedings{Austin2017TranslatingVT,
  title={Translating Videos to Natural Language Using Deep Recurrent Neural Networks},
  author={UT Austin Austin and UMass Lowell Lowell},
  year={2017}
}
Solving the visual symbol grounding problem has long been a goal of artificial intelligence. The field appears to be advancing closer to this goal with recent breakthroughs in deep learning for natural language grounding in static images. In this paper, we propose to translate videos directly to sentences using a unified deep neural network with both convolutional and recurrent structure. Described video datasets are scarce, and most existing methods have been applied to toy domains with a… CONTINUE READING
Highly Influential
This paper has highly influenced 62 other papers. REVIEW HIGHLY INFLUENTIAL CITATIONS
Highly Cited
This paper has 494 citations. REVIEW CITATIONS

Citations

Publications citing this paper.
Showing 1-10 of 325 extracted citations

Automatic video description generation via LSTM with joint two-stream encoding

2016 23rd International Conference on Pattern Recognition (ICPR) • 2016
View 13 Excerpts
Highly Influenced

Fine-Grained Video Captioning for Sports Narrative

CVPR • 2018
View 6 Excerpts
Highly Influenced

Dense-Captioning Events in Videos

2017 IEEE International Conference on Computer Vision (ICCV) • 2017
View 8 Excerpts
Highly Influenced

Improving Interpretability of Deep Neural Networks with Semantic Information

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) • 2017
View 6 Excerpts
Highly Influenced

494 Citations

05010015020152016201720182019
Citations per Year
Semantic Scholar estimates that this publication has 494 citations based on the available data.

See our FAQ for additional information.

References

Publications referenced by this paper.
Showing 1-10 of 50 references

Show and tell: A neural image caption generator

2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) • 2015
View 5 Excerpts
Highly Influenced

Microsoft COCO: Common Objects in Context

View 4 Excerpts
Highly Influenced

ImageNet Large Scale Visual Recognition Challenge

International Journal of Computer Vision • 2015
View 3 Excerpts
Highly Influenced

Translating Video Content to Natural Language Descriptions

2013 IEEE International Conference on Computer Vision • 2013
View 4 Excerpts
Highly Influenced

YouTube2Text: Recognizing and Describing Arbitrary Activities Using Semantic Hierarchies and Zero-Shot Recognition

2013 IEEE International Conference on Computer Vision • 2013
View 10 Excerpts
Highly Influenced

Similar Papers

Loading similar papers…