Learn More
We introduce a new large-scale video dataset designed to assess the performance of diverse visual event recognition algorithms with a focus on continuous visual event recognition (CVER) in outdoor areas with wide coverage. Previous datasets for action recognition are unrealistic for real-world surveillance because they consist of short clips showing one(More)
Complex activities typically consist of multiple primitive events happening in parallel or sequentially over a period of time. Understanding such activities requires recognizing not only each individual event but, more importantly, capturing their spatiotemporal dependencies over different time intervals. Most of the current graphical model-based approaches(More)
agents are interacting in a time-varying manner. In surveillance, people form into queues and others groups, and transfer baggage. Team sports involve multiple players acting in a coordinated effort. Our goal is to efficiently model and recognize such coordinated activities in video. In this paper, we focus on the domain of American football, as it presents(More)
We present a novel approach to learning motion behavior in video, and detecting abnormal behavior, using hierarchical clustering of hidden Markov models (HMMs). A continuous stream of track data is used for online and on-demand creation and training of HMMs, where tracks may be of highly variable length and scenes may be very complex with an unknown number(More)
We are interested in modeling and recognizing complex behaviors in video, where multiple agents are interacting in a time-varying manner and in a spatially-localized domain such as American football. Our approach pushes the model complexity onto the observations by using a multi-variate kernel density while maintaining a simple HMM model. The temporal(More)
Modeling interactions of multiple co-occurring objects in a complex activity is becoming increasingly popular in the video domain. The Dynamic Bayesian Network (DBN) has been applied to this problem in the past due to its natural ability to statistically capture complex temporal dependencies. However, standard DBN structure learning algorithms are(More)
We present a method to detect and recognize functional scene elements in video scenes. A functional scene element is a location or object that is primarily defined by its specific function or purpose, rather than its appearance or shape. Our method combines techniques from video scene analysis with functional recognition to decompose a video scene into its(More)
When a video surveillance scene is observed over time, motion patterns can be learned and used to detect abnormal activity. The primary challenges include fragmented object tracks, sparse training data, unbalanced training data, and even temporal localization of a deviation. We present a novel approach to learning motion behavior in video, and detecting(More)
agents co-exist and are interacting in a time-varying manner. For example, in the surveillance domain one person may open a door of a vehicle so another person can load an object before they both enter the vehicle. Similarly, team sports involve multiple players acting in a coordinated manner. Our goal is to model and recognize such coordinated activities(More)