Grounded Language Learning from Video Described with Sentences

  title={Grounded Language Learning from Video Described with Sentences},
  author={Haonan Yu and Jeffrey Mark Siskind},
We present a method that learns representations for word meanings from short video clips paired with sentences. Unlike prior work on learning language from symbolic input, our input consists of video of people interacting with multiple complex objects in outdoor environments. Unlike prior computer-vision approaches that learn from videos with verb labels or images with noun labels, our labels are sentences containing nouns, verbs, prepositions, adjectives, and adverbs. The correspondence… CONTINUE READING
Highly Cited
This paper has 133 citations. REVIEW CITATIONS

From This Paper

Figures, tables, and topics from this paper.


Publications citing this paper.
Showing 1-10 of 97 extracted citations

134 Citations

Citations per Year
Semantic Scholar estimates that this publication has 134 citations based on the available data.

See our FAQ for additional information.


Publications referenced by this paper.
Showing 1-10 of 27 references

Similar Papers

Loading similar papers…