Image Captioning with Deep Bidirectional LSTMs

@article{Wang2016ImageCW,
  title={Image Captioning with Deep Bidirectional LSTMs},
  author={Cheng Wang and Haojin Yang and C. Bartz and C. Meinel},
  journal={Proceedings of the 24th ACM international conference on Multimedia},
  year={2016}
}
  • Cheng Wang, Haojin Yang, +1 author C. Meinel
  • Published 2016
  • Computer Science
  • Proceedings of the 24th ACM international conference on Multimedia
  • This work presents an end-to-end trainable deep bidirectional LSTM (Long-Short Term Memory) model for image captioning. [...] Key Method Two novel deep bidirectional variant models, in which we increase the depth of nonlinearity transition in different way, are proposed to learn hierarchical visual-language embeddings. Data augmentation techniques such as multi-crop, multi-scale and vertical mirror are proposed to prevent overfitting in training deep models.Expand Abstract
    144 Citations
    Image Captioning with Deep Bidirectional LSTMs and Multi-Task Learning
    • 48
    • PDF
    Image Captioning with Bidirectional Semantic Attention-Based Guiding of Long Short-Term Memory
    • 9
    • Highly Influenced
    • PDF
    Exploring Overall Contextual Information for Image Captioning in Human-Like Cognitive Style
    • 5
    • PDF
    What Convnets Make for Image Captioning?
    • 4
    Deep Hierarchical Encoder–Decoder Network for Image Captioning
    • 11
    AttResNet: Attention-based ResNet for Image Captioning
    Bi-SAN-CAP: Bi-Directional Self-Attention for Image Captioning
    • 3

    References

    SHOWING 1-10 OF 16 REFERENCES
    Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN)
    • 862
    • Highly Influential
    • PDF
    Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models
    • 885
    • Highly Influential
    • PDF
    Deep Visual-Semantic Alignments for Generating Image Descriptions
    • A. Karpathy, Li Fei-Fei
    • Computer Science, Medicine
    • IEEE Transactions on Pattern Analysis and Machine Intelligence
    • 2017
    • 1,783
    • Highly Influential
    • PDF
    Long-term recurrent convolutional networks for visual recognition and description
    • 3,471
    • Highly Influential
    • PDF
    Show and tell: A neural image caption generator
    • 3,701
    • Highly Influential
    • PDF
    Deep Fragment Embeddings for Bidirectional Image Sentence Mapping
    • 631
    • Highly Influential
    • PDF
    Very Deep Convolutional Networks for Large-Scale Image Recognition
    • 44,985
    • Highly Influential
    • PDF
    ImageNet classification with deep convolutional neural networks
    • 59,714
    • Highly Influential
    • PDF
    Visualizing and Understanding Convolutional Networks
    • 9,102
    • Highly Influential
    • PDF
    CIDEr: Consensus-based image description evaluation
    • 1,492
    • Highly Influential
    • PDF