Show and tell: A neural image caption generator

@article{Vinyals2015ShowAT,
  title={Show and tell: A neural image caption generator},
  author={Oriol Vinyals and A. Toshev and S. Bengio and D. Erhan},
  journal={2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2015},
  pages={3156-3164}
}
Automatically describing the content of an image is a fundamental problem in artificial intelligence that connects computer vision and natural language processing. [...] Key Method The model is trained to maximize the likelihood of the target description sentence given the training image. Experiments on several datasets show the accuracy of the model and the fluency of the language it learns solely from image descriptions. Our model is often quite accurate, which we verify both qualitatively and quantitatively…Expand
3,728 Citations
Show and Tell: Lessons Learned from the 2015 MSCOCO Image Captioning Challenge
  • 503
  • PDF
Fast image captioning using LSTM
  • 7
  • Highly Influenced
Learning to Caption Images with Two-Stream Attention and Sentence Auto-Encoder
Phrase-based Image Captioning
  • 91
  • Highly Influenced
  • PDF
Fine-grained attention for image caption generation
  • 12
  • Highly Influenced
From captions to visual concepts and back
  • 990
  • PDF
Image Captioning using Deep Learning
  • 2
  • Highly Influenced
Image Caption Generation with Part of Speech Guidance
  • 30
  • PDF
The Role of Attention Mechanism and Multi-Feature in Image Captioning
  • 1
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 36 REFERENCES
Explain Images with Multimodal Recurrent Neural Networks
  • 294
  • PDF
Framing Image Description as a Ranking Task: Data, Models and Evaluation Metrics (Extended Abstract)
  • 757
  • PDF
Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models
  • 901
  • PDF
Sequence to Sequence Learning with Neural Networks
  • 11,881
  • PDF
Grounded Compositional Semantics for Finding and Describing Images with Sentences
  • 706
  • Highly Influential
  • PDF
Every Picture Tells a Story: Generating Sentences from Images
  • 866
  • PDF
DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition
  • 3,771
  • PDF
Neural Machine Translation by Jointly Learning to Align and Translate
  • 14,863
  • PDF
CIDEr: Consensus-based image description evaluation
  • 1,522
  • PDF
Multimodal Neural Language Models
  • 523
  • PDF
...
1
2
3
4
...