Integrating Language and Vision to Generate Natural Language Descriptions of Videos in the Wild

@inproceedings{Thomason2014IntegratingLA,
  title={Integrating Language and Vision to Generate Natural Language Descriptions of Videos in the Wild},
  author={Jesse Thomason and Subhashini Venugopalan and Sergio Guadarrama and Kate Saenko and Raymond J. Mooney},
  booktitle={COLING},
  year={2014}
}
This paper integrates techniques in natural language processing and computer vision to improve recognition and description of entities and activities in real-world videos. We propose a strategy for generating textual descriptions of videos by using a factor graph to combine visual detections with language statistics. We use state-of-the-art visual recognition systems to obtain confidences on entities, activities, and scenes present in the video. Our factor graph model combines these detection… CONTINUE READING
Highly Cited
This paper has 118 citations. REVIEW CITATIONS

5 Figures & Tables

Topics

Statistics

0204020142015201620172018
Citations per Year

118 Citations

Semantic Scholar estimates that this publication has 118 citations based on the available data.

See our FAQ for additional information.