Learning and parsing video events with goal and intent prediction


In this paper, we present a framework for parsing video events with stochastic Temporal And-Or Graph (T-AOG) and unsupervised learning of the T-AOG from video. This T-AOG represents a stochastic event grammar. The alphabet of the T-AOG consists of a set of grounded spatial relations including the poses of agents and their interactions with objects in the… (More)
DOI: 10.1016/j.cviu.2012.12.003


Figures and Tables

Sorry, we couldn't extract any figures or tables for this paper.


Citations per Year

Citation Velocity: 16

Averaging 16 citations per year over the last 3 years.

Learn more about how we calculate this metric in our FAQ.

Slides referencing similar topics