Image Captioning with Deep Bidirectional LSTMs and Multi-Task Learning
@article{Wang2018ImageCW, title={Image Captioning with Deep Bidirectional LSTMs and Multi-Task Learning}, author={Cheng Wang and Haojin Yang and C. Meinel}, journal={ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM)}, year={2018}, volume={14}, pages={1 - 20} }
Generating a novel and descriptive caption of an image is drawing increasing interests in computer vision, natural language processing, and multimedia communities. [...] Key Method We also explore deep multimodal bidirectional models, in which we increase the depth of nonlinearity transition in different ways to learn hierarchical visual-language embeddings. Data augmentation techniques such as multi-crop, multi-scale, and vertical mirror are proposed to prevent overfitting in training deep models.Expand Abstract
Supplemental Code
Figures, Tables, and Topics from this paper
48 Citations
Image Captioning using Deep Stacked LSTMs, Contextual Word Embeddings and Data Augmentation
- Computer Science
- ArXiv
- 2021
- PDF
Exploring Overall Contextual Information for Image Captioning in Human-Like Cognitive Style
- Computer Science
- 2019 IEEE/CVF International Conference on Computer Vision (ICCV)
- 2019
- 5
- PDF
Bi-SAN-CAP: Bi-Directional Self-Attention for Image Captioning
- Computer Science
- 2019 Digital Image Computing: Techniques and Applications (DICTA)
- 2019
- 3
A Deep Decoder Structure Based on WordEmbedding Regression for An Encoder-Decoder Based Model for Image Captioning
- Computer Science
- ArXiv
- 2019
- 2
- PDF
Recall What You See Continually Using GridLSTM in Image Captioning
- Computer Science
- IEEE Transactions on Multimedia
- 2020
- 2
Survey of deep learning and architectures for visual captioning—transitioning between media and natural languages
- Computer Science
- Multimedia Tools and Applications
- 2019
- 9
Dual-path Convolutional Image-Text Embeddings with Instance Loss
- Computer Science
- ACM Trans. Multim. Comput. Commun. Appl.
- 2020
- 63
- Highly Influenced
MRECN: mixed representation enhanced (de)compositional network for caption generation from visual features, modeling as pseudo tensor product representation
- Computer Science
- Int. J. Multim. Inf. Retr.
- 2020
MRRC: Multiple Role Representation Crossover Interpretation for Image Captioning With R-CNN Feature Distribution Composition (FDC)
- Computer Science, Mathematics
- ArXiv
- 2020
- 3
- PDF
References
SHOWING 1-10 OF 25 REFERENCES
Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models
- Computer Science
- ArXiv
- 2014
- 885
- Highly Influential
- PDF
Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN)
- Computer Science
- ICLR
- 2015
- 862
- Highly Influential
- PDF
Long-term recurrent convolutional networks for visual recognition and description
- Computer Science, Medicine
- 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- 2015
- 3,471
- Highly Influential
- PDF
Show and tell: A neural image caption generator
- Computer Science
- 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- 2015
- 3,703
- Highly Influential
- PDF
Deep Visual-Semantic Alignments for Generating Image Descriptions
- Computer Science, Medicine
- IEEE Transactions on Pattern Analysis and Machine Intelligence
- 2017
- 1,785
- Highly Influential
- PDF
Two-Stream Convolutional Networks for Action Recognition in Videos
- Computer Science
- NIPS
- 2014
- 4,383
- Highly Influential
- PDF
Image Captioning with Semantic Attention
- Computer Science
- 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- 2016
- 951
- Highly Influential
- PDF
Deep Fragment Embeddings for Bidirectional Image Sentence Mapping
- Computer Science, Mathematics
- NIPS
- 2014
- 631
- Highly Influential
- PDF
Very Deep Convolutional Networks for Large-Scale Image Recognition
- Computer Science
- ICLR
- 2015
- 45,023
- Highly Influential
- PDF
ImageNet classification with deep convolutional neural networks
- Computer Science
- Commun. ACM
- 2012
- 59,747
- Highly Influential
- PDF