Image Captioning with Clause-Focused Metrics in a Multi-modal Setting for Marketing

@article{Harzig2019ImageCW,
  title={Image Captioning with Clause-Focused Metrics in a Multi-modal Setting for Marketing},
  author={Philipp Harzig and D. Zecha and R. Lienhart and C. Kaiser and Ren{\'e} Schallner},
  journal={2019 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR)},
  year={2019},
  pages={419-424}
}
  • Philipp Harzig, D. Zecha, +2 authors René Schallner
  • Published 2019
  • Computer Science
  • 2019 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR)
  • Automatically generating descriptive captions for images is a well-researched area in computer vision. However, existing evaluation approaches focus on measuring the similarity between two sentences disregarding fine-grained semantics of the captions. In our setting of images depicting persons interacting with branded products, the subject, predicate, object and the name of the branded product are important evaluation criteria of the generated captions. Generating image captions with these… CONTINUE READING
    1 Citations
    Towards Better Graph Representation: Two-Branch Collaborative Graph Neural Networks For Multimodal Marketing Intention Detection
    • PDF

    References

    SHOWING 1-10 OF 14 REFERENCES
    Multimodal Image Captioning for Marketing Analysis
    • 10
    • PDF
    Show and Tell: Lessons Learned from the 2015 MSCOCO Image Captioning Challenge
    • 497
    • Highly Influential
    • PDF
    CIDEr: Consensus-based image description evaluation
    • 1,492
    • PDF
    Show and tell: A neural image caption generator
    • 3,701
    • Highly Influential
    • PDF
    DenseCap: Fully Convolutional Localization Networks for Dense Captioning
    • 757
    • PDF
    Deep Visual-Semantic Alignments for Generating Image Descriptions
    • A. Karpathy, Li Fei-Fei
    • Computer Science, Medicine
    • IEEE Transactions on Pattern Analysis and Machine Intelligence
    • 2017
    • 1,782
    • PDF
    Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models
    • 885
    • PDF
    Tips and Tricks for Visual Question Answering: Learnings from the 2017 Challenge
    • 214
    • PDF
    Microsoft COCO: Common Objects in Context
    • 12,143
    • PDF
    Multi-task Sequence to Sequence Learning
    • 581
    • PDF