Object Hallucination in Image Captioning

  title={Object Hallucination in Image Captioning},
  author={Anna Rohrbach and Lisa Anne Hendricks and Kaylee Burns and Trevor Darrell and Kate Saenko},
  • Anna Rohrbach, Lisa Anne Hendricks, +2 authors Kate Saenko
  • Published in EMNLP 2018
  • Computer Science
  • Despite continuously improving performance, contemporary image captioning models are prone to "hallucinating" objects that are not actually in a scene. [...] Key Result Our analysis yields several interesting findings, including that models which score best on standard sentence metrics do not always have lower hallucination and that models which hallucinate more tend to make errors driven by language priors.Expand Abstract
    50 Citations

    Figures, Tables, and Topics from this paper

    Explore Further: Topics Discussed in This Paper

    Towards Unique and Informative Captioning of Images
    • Highly Influenced
    • PDF
    More Grounded Image Captioning by Distilling Image-Text Matching Model
    • 8
    • PDF
    Understanding Image Captioning Models beyond Visualizing Attention
    • PDF
    Deconfounded Image Captioning: A Causal Retrospect
    • 13
    • PDF
    Fusion Models for Improved Visual Captioning
    • PDF


    Learning to Evaluate Image Captioning
    • 45
    • PDF
    Deep Compositional Captioning: Describing Novel Object Categories without Paired Training Data
    • 182
    • PDF
    Discriminability Objective for Training Descriptive Captions
    • 86
    • Highly Influential
    • PDF
    Neural Baby Talk
    • 208
    • Highly Influential
    • PDF
    SPICE: Semantic Propositional Image Caption Evaluation
    • 506
    • PDF
    Understanding Blind People's Experiences with Computer-Generated Captions of Social Media Images
    • 58
    • PDF
    Speaking the Same Language: Matching Machine to Human Captions by Adversarial Training
    • 72
    • PDF
    CIDEr: Consensus-based image description evaluation
    • 1,448
    • Highly Influential
    • PDF
    Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations
    • 1,641
    • PDF