Composing Simple Image Descriptions using Web-scale N-grams


Studying natural language, and especially how people describe the world around them can help us better understand the visual world. In turn, it can also help us in the quest to generate natural language that describes this world in a human manner. We present a simple yet effective approach to automatically compose image descriptions given computer vision based inputs and using web-scale n-grams. Unlike most previous work that summarizes or retrieves pre-existing text relevant to an image, our method composes sentences entirely from scratch. Experimental results indicate that it is viable to generate simple textual descriptions that are pertinent to the specific content of an image, while permitting creativity in the description – making for more human-like annotations than previous approaches.

Extracted Key Phrases

8 Figures and Tables

Citations per Year

146 Citations

Semantic Scholar estimates that this publication has 146 citations based on the available data.

See our FAQ for additional information.

Cite this paper

@inproceedings{Li2011ComposingSI, title={Composing Simple Image Descriptions using Web-scale N-grams}, author={Siming Li and Girish Kulkarni and Tamara L. Berg and Alexander C. Berg and Yejin Choi}, booktitle={CoNLL}, year={2011} }