Jinlian Wei

Learn More
We present a system that produces sentential descriptions of video: who did what to whom, and where and how they did it. Action class is rendered as a verb, participant objects as noun phrases, properties of those objects as adjectival modifiers in those noun phrases, spatial relations between those participants as prepositional phrases, and characteristics(More)
We present an approach to labeling short video clips with English verbs as event descriptions. A key distinguishing aspect of this work is that it labels videos with verbs that describe the spa-tiotemporal interaction between event participants , humans and objects interacting with each other, abstracting away all object-class information and fine-grained(More)
  • 1