Learn More
Occlusion and lack of visibility in crowded and cluttered scenes make it difficult to track individual people correctly and consistently, particularly in a single view. We present a multi-view approach to solving this problem. In our approach we neither detect nor track objects from any single camera or camera pair; rather evidence is gathered from all the(More)
Occlusion and lack of visibility in dense crowded scenes make it very difficult to track individual people correctly and consistently. This problem is particularly hard to tackle in single camera systems. We present a multi-view approach to tracking people in crowded scenes where people may be partially or completely occluding each other. Our approach is to(More)
In this paper, a novel object class detection method based on 3D object modeling is presented. Instead of using a complicated mechanism for relating multiple 2D training views, the proposed method establishes spatial connections between these views by mapping them directly to the surface of 3D model. The 3D shape of an object is reconstructed by using a(More)
In this paper we present a novel approach using a 4D (x,y,z,t) action feature model (4D-AFM) for recognizing actions from arbitrary views. The 4D-AFM elegantly encodes shape and motion of actors observed from multiple views. The modeling process starts with reconstructing 3D visual hulls of actors at each time instant. Spatiotemporal action features are(More)
This paper presents a purely image-based approach to fusing foreground silhouette information from multiple arbitrary views. Our approach does not require 3D constructs like camera calibration to carve out 3D voxels or project visual cones in 3D space. Using planar homographies and foreground likelihood information from a set of arbitrary views, we show(More)
Finding the location where a picture was taken is an important problem for a variety of applications including surveying , interactive traveling and homeland security among others. The task becomes intractable though when the area under investigation reaches city/town size. The amount of data (pictures/videos) required to visually map a city,(More)
We present a novel approach for multi-modal affect analysis in human interactions that is capable of integrating data from multiple modalities while also taking into account temporal dynamics. Our fusion approach, Joint Hidden Conditional Random Fields (JHRCFs), combines the advantages of purely feature level (early fusion) fusion approaches with late(More)
We propose a novel hybrid model that exploits the strength of discriminative classifiers along with the rep-resentational power of generative models. Our focus is on detecting multimodal events in time varying sequences. Discriminative classifiers have been shown to achieve higher performances than the corresponding generative likelihood-based classifiers.(More)