Learn More
Egocentric cameras can be used to benefit such tasks as analyzing fine motor skills, recognizing gestures and learning about hand-object manipulation. To enable such technology , we believe that the hands must detected on the pixel-level to gain important information about the shape of the hands and fingers. We show that the problem of pixel-wise hand(More)
From the viewpoint of an intelligent video surveillance system , the high-level recognition of human activity requires a priori hierarchical domain knowledge as well as a means of reasoning based on that knowledge. We approach the problem of human activity recognition based on the understanding that activities are hierarchical, temporally constrained and(More)
Our aim is to show how state-of-the-art computer vision techniques can be used to advance prehensile analysis (i.e., understanding the functionality of human hands). Prehen-sile analysis is a broad field of multidisciplinary interest, where researchers painstakingly manually analyze hours of hand-object interaction videos to understand the mechanics of hand(More)
In recent years stochastic context-free grammars have been shown to be effective in modeling human activities because of the hierarchical structures they represent. However , most of the research in this area has yet to address the issue of learning the activity grammars from a noisy input source, namely, video. In this paper, we present a framework for(More)
We consider the problem of designing a scene-specific pedestrian detector in a scenario where we have zero instances of real pedestrian data (i.e., no labeled real data or unsupervised real data). This scenario may arise when a new surveillance system is installed in a novel location and a scene-specific pedestrian detector must be trained prior to any(More)
What is a good vector representation of an object? We believe that it should be generative in 3D, in the sense that it can produce new 3D objects; as well as be predictable from 2D, in the sense that it can be perceived from 2D images. We propose a novel architecture, called the TL-embedding network , to learn an embedding space with these properties. The(More)