Learn More
We address the task of pixel-level hand detection in the context of ego-centric cameras. Extracting hand regions in ego-centric videos is a critical step for understanding hand-object manipulation and analyzing hand-eye coordination. However, in contrast to traditional applications of hand detection, such as gesture interfaces or sign-language recognition,(More)
Egocentric cameras can be used to benefit such tasks as analyzing fine motor skills, recognizing gestures and learning about hand-object manipulation. To enable such technology, we believe that the hands must detected on the pixel-level to gain important information about the shape of the hands and fingers. We show that the problem of pixel-wise hand(More)
From the viewpoint of an intelligent video surveillance system , the high-level recognition of human activity requires a priori hierarchical domain knowledge as well as a means of reasoning based on that knowledge. We approach the problem of human activity recognition based on the understanding that activities are hierarchical, temporally constrained and(More)
We consider the problem of designing a scene-specific pedestrian detector in a scenario where we have zero instances of real pedestrian data (i.e., no labeled real data or unsupervised real data). This scenario may arise when a new surveillance system is installed in a novel location and a scene-specific pedestrian detector must be trained prior to any(More)
Our aim is to show how state-of-the-art computer vision techniques can be used to advance prehensile analysis (i.e., understanding the functionality of human hands). Prehensile analysis is a broad field of multi-disciplinary interest, where researchers painstakingly manually analyze hours of hand-object interaction videos to understand the mechanics of hand(More)
We bring together ideas from recent work on feature design for egocentric action recognition under one framework by exploring the use of deep convolutional neural networks (CNN). Recent work has shown that features such as hand appearance, object attributes, local hand motion and camera ego-motion are important for characterizing first-person actions. To(More)
What is a good vector representation of an object? We believe that it should be generative in 3D, in the sense that it can produce new 3D objects; as well as be predictable from 2D, in the sense that it can be perceived from 2D images. We propose a novel architecture, called the TL-embedding network , to learn an embedding space with these properties. The(More)
Maximum entropy inverse optimal control (MaxEnt IOC) is an effective means of discovering the underlying cost function of demonstrated human activity and can be used to predict human behavior over low-dimensional state spaces (i.e., forecasting of 2D trajectories). To enable inference in very large state spaces, we introduce an approximate MaxEnt IOC(More)