An Empirical Study of Language CNN for Image Captioning
@article{Gu2017AnES, title={An Empirical Study of Language CNN for Image Captioning}, author={Jiuxiang Gu and G. Wang and Jianfei Cai and Tsuhan Chen}, journal={2017 IEEE International Conference on Computer Vision (ICCV)}, year={2017}, pages={1231-1240} }
Language models based on recurrent neural networks have dominated recent image caption generation tasks. In this paper, we introduce a language CNN model which is suitable for statistical language modeling tasks and shows competitive performance in image captioning. In contrast to previous models which predict next word based on one previous word and hidden state, our language CNN is fed with all the previous words and can model the long-range dependencies in history words, which are critical…
Figures and Tables from this paper
85 Citations
Long-Term Recurrent Merge Network Model for Image Captioning
- Computer Science2018 IEEE 30th International Conference on Tools with Artificial Intelligence (ICTAI)
- 2018
A Long-term Recurrent Merge Network (LRMN) model is proposed to merge the image feature at each step via a language model, which not only can improve the accuracy of image captioning, but also can describe the image better.
A Survey on Image Encoders and Language Models for Image Captioning
- Computer Science
- 2021
The image encoders and language models used by the state-of-the-art image captioning models is discussed and convolutional neural network {CNN] is applied.
A Survey on Image Captioning datasets and Evaluation Metrics
- Computer Science
- 2021
Various datasets and evaluation metrics which are useful for image captioning task are discussed and the datasets and Evaluation metrics applied by the state-of-the-art image captioned models are summarized.
Image Captioning Using R-CNN & LSTM Deep Learning Model
- Computer Science
- 2021
A Fully Convolution Localization Network is proposed that processes a picture with a single forward pass which can be consistently trained in a single round of optimization.
Attention-Based Deep Learning Model for Image Captioning: A Comparative Study
- Computer ScienceInternational Journal of Image, Graphics and Signal Processing
- 2019
This paper proposes the comparative study for attention-based deep learning model for image captioning, and presents the basic analyzing techniques for performance, advantages, and weakness.
Survey of convolutional neural networks for image captioning
- Computer Science
- 2020
A survey of models using CNN for image embedding and RNN for language modeling and prediction, and the advantage that make this section worth exploring is provided.
A Multi-task Learning Approach for Image Captioning
- Computer ScienceIJCAI
- 2018
The experimental results demonstrate that the proposed Multi-task Learning Approach for Image Captioning achieves impressive results compared to other strong competitors.
A Hybridized Deep Learning Method for Bengali Image Captioning
- Computer Science
- 2021
A standard strategy for Bengali image caption generation on two different sizes of the Flickr8k dataset and BanglaLekha dataset which is the only publicly available Bengali dataset for image captioning is proffer.
Recurrent Fusion Network for Image Captioning
- Computer ScienceECCV
- 2018
This paper proposes a novel recurrent fusion network (RFNet) for the image captioning task, which can exploit the interactions among the outputs of the image encoders and generate new compact and informative representations for the decoder.
Dual-CNN: A Convolutional language decoder for paragraph image captioning
- Computer ScienceNeurocomputing
- 2020
References
SHOWING 1-10 OF 62 REFERENCES
Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN)
- Computer ScienceICLR
- 2015
The m-RNN model directly models the probability distribution of generating a word given previous words and an image, and achieves significant performance improvement over the state-of-the-art methods which directly optimize the ranking objective function for retrieval.
A Convolutional Architecture for Word Sequence Prediction
- Computer Science
- 2015
It is argued that the proposed convolutional neural network, named genCNN, can give adequate representation of the history, and therefore can naturally exploit both the short and long range dependencies.
genCNN: A Convolutional Architecture for Word Sequence Prediction
- Computer ScienceACL
- 2015
It is argued that the proposed novel convolutional architecture, named $gen$CNN, can give adequate representation of the history, and therefore can naturally exploit both the short and long range dependencies.
Phrase-based Image Captioning
- Computer ScienceICML
- 2015
This paper presents a simple model that is able to generate descriptive sentences given a sample image and proposes a simple language model that can produce relevant descriptions for a given test image using the phrases inferred.
A Convolutional Neural Network for Modelling Sentences
- Computer ScienceACL
- 2014
A convolutional architecture dubbed the Dynamic Convolutional Neural Network (DCNN) is described that is adopted for the semantic modelling of sentences and induces a feature graph over the sentence that is capable of explicitly capturing short and long-range relations.
Show and Tell: Lessons Learned from the 2015 MSCOCO Image Captioning Challenge
- Computer ScienceIEEE Transactions on Pattern Analysis and Machine Intelligence
- 2017
A generative model based on a deep recurrent architecture that combines recent advances in computer vision and machine translation and that can be used to generate natural sentences describing an image is presented.
Guiding the Long-Short Term Memory Model for Image Caption Generation
- Computer Science2015 IEEE International Conference on Computer Vision (ICCV)
- 2015
In this work we focus on the problem of image caption generation. We propose an extension of the long short term memory (LSTM) model, which we coin gLSTM for short. In particular, we add semantic…
Show and tell: A neural image caption generator
- Computer Science2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- 2015
This paper presents a generative model based on a deep recurrent architecture that combines recent advances in computer vision and machine translation and that can be used to generate natural sentences describing an image.
DenseCap: Fully Convolutional Localization Networks for Dense Captioning
- Computer Science2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- 2016
A Fully Convolutional Localization Network (FCLN) architecture is proposed that processes an image with a single, efficient forward pass, requires no external regions proposals, and can be trained end-to-end with asingle round of optimization.
From captions to visual concepts and back
- Computer Science2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- 2015
This paper uses multiple instance learning to train visual detectors for words that commonly occur in captions, including many different parts of speech such as nouns, verbs, and adjectives, and develops a maximum-entropy language model.