MTLE: A Multitask Learning Encoder of Visual Feature Representations for Video and Movie Description
@article{Nina2018MTLEAM, title={MTLE: A Multitask Learning Encoder of Visual Feature Representations for Video and Movie Description}, author={Oliver Nina and W. Garcia and Scott Clouse and A. Yilmaz}, journal={ArXiv}, year={2018}, volume={abs/1809.07257} }
Learning visual feature representations for video analysis is a daunting task that requires a large amount of training samples and a proper generalization framework. Many of the current state of the art methods for video captioning and movie description rely on simple encoding mechanisms through recurrent neural networks to encode temporal visual information extracted from video data. In this paper, we introduce a novel multitask encoder-decoder framework for automatic semantic description and… CONTINUE READING
Figures, Tables, and Topics from this paper
2 Citations
Adversarial Video Captioning
- Computer Science
- 2019 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W)
- 2019
References
SHOWING 1-10 OF 39 REFERENCES
Describing Videos by Exploiting Temporal Structure
- Computer Science, Mathematics
- 2015 IEEE International Conference on Computer Vision (ICCV)
- 2015
- 750
- Highly Influential
- PDF
Sequence to Sequence -- Video to Text
- Computer Science
- 2015 IEEE International Conference on Computer Vision (ICCV)
- 2015
- 893
- PDF
Hierarchical Recurrent Neural Encoder for Video Representation with Application to Captioning
- Computer Science
- 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- 2016
- 286
- PDF
MSR-VTT: A Large Video Description Dataset for Bridging Video and Language
- Computer Science
- 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- 2016
- 418
- Highly Influential
- PDF
Learning explicit video attributes from mid-level representation for video captioning
- Computer Science
- Comput. Vis. Image Underst.
- 2017
- 14
Integrating Language and Vision to Generate Natural Language Descriptions of Videos in the Wild
- Computer Science
- COLING
- 2014
- 164
- PDF
Video Captioning with Transferred Semantic Attributes
- Computer Science
- 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- 2017
- 184
- PDF