Learn More
We propose to leverage multiple sources of information to compute an estimate of the number of individuals present in an extremely dense crowd visible in a single image. Due to problems including perspective, occlusion, clutter, and few pixels per person, counting by human detection in such images is almost impossible. Instead, our approach relies on(More)
—In this paper, we propose a novel method that exploits correlation between audiovisual dynamics of a video to segment and localize objects that are the dominant source of audio. Our approach consists of a two-step spatiotemporal segmentation mechanism that relies on velocity and acceleration of moving objects as visual features. Each frame of the video is(More)
We present a method for multi-target tracking that exploits the persistence in detection of object parts. While the implicit representation and detection of body parts have recently been leveraged for improved human detection, ours is the first method that attempts to temporally constrain the location of human body parts with the express purpose of(More)
We present a novel method for the discovery and statistical representation of motion patterns in a scene observed by a static camera. Related methods involving learning of patterns of activity rely on trajectories obtained from object detection and tracking systems, which are unreliable in complex scenes of crowded motion. We propose a mixture model(More)
This paper presents a novel framework for tracking thousands of vehicles in high resolution, low frame rate, multiple camera aerial videos. The proposed algorithm avoids the pitfalls of global minimization of data association costs and instead maintains multiple object-centric associations for each track. Representation of object state in terms of many to(More)
This paper proposes a novel representation of articulated human actions and gestures and facial expressions. The main goals of the proposed approach are: 1) to enable recognition using very few examples, i.e., one or k-shot learning, and 2) meaningful organization of unlabeled datasets by unsupervised clustering. Our proposed representation is obtained by(More)
In this paper we present a novel approach for detection of independently moving foreground objects in non-planar scenes captured by a moving camera. We avoid the traditional assumptions that the stationary background of the scene is planar, or that it can be approximated by dominant single or multiple planes, or that the camera used to capture the video is(More)
This paper proposes a novel method for recognition and classification of events represented by Mixture distributions of location and flow. The main idea is to classify observed events into semantically meaningful groups even when motion is observed from distinct viewpoints. Events in the proposed framework are modeled as motion patterns, which are(More)
We propose a novel method to model and learn the scene activity, observed by a static camera. The proposed model is very general and can be applied for solution of a variety of problems. The motion patterns of objects in the scene are modeled in the form of a multivariate nonparametric probability density function of spatiotemporal variables (object(More)
Efficient modeling of actions is critical for recognizing human actions. Recently , bag of video words (BoVW) representation, in which features computed around spatiotemporal interest points are quantized into video words based on their appearance similarity, has been widely and successfully explored. The performance of this representation however, is(More)