Corpus-Guided Sentence Generation of Natural Images

Abstract

We propose a sentence generation strategy that describes images by predicting the most likely nouns, verbs, scenes and prepositions that make up the core sentence structure. The input are initial noisy estimates of the objects and scenes detected in the image using state of the art trained detectors. As predicting actions from still images directly is unreliable, we use a language model trained from the English Gigaword corpus to obtain their estimates; together with probabilities of co-located nouns, scenes and prepositions. We use these estimates as parameters on a HMM that models the sentence generation process, with hidden nodes as sentence components and image detections as the emissions. Experimental results show that our strategy of combining vision and language produces readable and descriptive sentences compared to naive strategies that use vision alone.

View Slides

Extracted Key Phrases

13 Figures and Tables

0204060201220132014201520162017
Citations per Year

188 Citations

Semantic Scholar estimates that this publication has 188 citations based on the available data.

See our FAQ for additional information.

Cite this paper

@inproceedings{Yang2011CorpusGuidedSG, title={Corpus-Guided Sentence Generation of Natural Images}, author={Yezhou Yang and Ching Lik Teo and Hal Daum{\'e} and Yiannis Aloimonos}, booktitle={EMNLP}, year={2011} }