Image Captioning with Deep Bidirectional LSTMs and Multi-Task Learning

  title={Image Captioning with Deep Bidirectional LSTMs and Multi-Task Learning},
  author={Cheng Wang and Haojin Yang and C. Meinel},
  journal={ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM)},
  pages={1 - 20}
  • Cheng Wang, Haojin Yang, C. Meinel
  • Published 2018
  • Computer Science
  • ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM)
  • Generating a novel and descriptive caption of an image is drawing increasing interests in computer vision, natural language processing, and multimedia communities. [...] Key Method We also explore deep multimodal bidirectional models, in which we increase the depth of nonlinearity transition in different ways to learn hierarchical visual-language embeddings. Data augmentation techniques such as multi-crop, multi-scale, and vertical mirror are proposed to prevent overfitting in training deep models.Expand Abstract
    48 Citations
    Exploring Overall Contextual Information for Image Captioning in Human-Like Cognitive Style
    • 5
    • PDF
    Bi-SAN-CAP: Bi-Directional Self-Attention for Image Captioning
    • 3
    Recall What You See Continually Using GridLSTM in Image Captioning
    • 2
    Survey of deep learning and architectures for visual captioning—transitioning between media and natural languages
    • Chiranjib Sur
    • Computer Science
    • Multimedia Tools and Applications
    • 2019
    • 9
    Dual-path Convolutional Image-Text Embeddings with Instance Loss
    • 63
    • Highly Influenced


    Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models
    • 885
    • Highly Influential
    • PDF
    Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN)
    • 862
    • Highly Influential
    • PDF
    Long-term recurrent convolutional networks for visual recognition and description
    • 3,471
    • Highly Influential
    • PDF
    Show and tell: A neural image caption generator
    • 3,703
    • Highly Influential
    • PDF
    Deep Visual-Semantic Alignments for Generating Image Descriptions
    • A. Karpathy, Li Fei-Fei
    • Computer Science, Medicine
    • IEEE Transactions on Pattern Analysis and Machine Intelligence
    • 2017
    • 1,785
    • Highly Influential
    • PDF
    Two-Stream Convolutional Networks for Action Recognition in Videos
    • 4,383
    • Highly Influential
    • PDF
    Image Captioning with Semantic Attention
    • 951
    • Highly Influential
    • PDF
    Deep Fragment Embeddings for Bidirectional Image Sentence Mapping
    • 631
    • Highly Influential
    • PDF
    Very Deep Convolutional Networks for Large-Scale Image Recognition
    • 45,023
    • Highly Influential
    • PDF
    ImageNet classification with deep convolutional neural networks
    • 59,747
    • Highly Influential
    • PDF