Nicolas H. Lehment

Learn More
CCTV systems have been introduced in most public spaces in order to increase security. Video outputs are observed by human operators if possible but mostly used as a forensic tool. Therefore it seems desirable to automate video surveillance systems, in order to be able to detect potentially dangerous situations as soon as possible. Multi camera systems have(More)
Current experiments with HCIs have shown a high demand for more natural interaction paradigms. Gestures are thereby considered the most important cue besides speech. In order to recognize gestures it is necessary to extract meaningful motion features from the body. Up to now mostly marker based tracking systems are used in virtual reality environments,(More)
This paper describes the TUM approaches for violent scenes detection in movies, submitted for the MediaEval 2012 Affect Challenge. Score fusion is used to fuse Support-Vector Machine (SVM) confidence scores assigned to short fixed length windows within each movie shot. SVM predictors for acoustic and visual channels are trained. For the acoustic channel, a(More)
The reliable detection and tracking of objects, in particular humans, in video sequences is a requirement for video surveillance systems. This step enables automated threat detection systems to analyze trajectories and motion patterns. Thereby systems based on multiple overlapping fields of view have emerged in the last years. These are usually relying on(More)
While monocular gesture recognition slowly reaches maturity , the inclusion of 3D gestures remains a challenge. In order to enable robust and versatile depth-enabled gestures, a depth-image based tracking approach is developed. Using a model-based annealing particle filter approach, the pose of a single subject is retrieved and tracked over longer image and(More)
The observation likelihood approximation is a central problem in stochastic human pose tracking. In this article we present a new approach to quantify the correspondence between hypothetical and observed human poses in depth images. Our approach is based on segmented point clouds, enabling accurate approximations even under conditions of self-occlusion and(More)
Without doubt general video and sound, as found in large multimedia archives, carry emotional information. Thus, audio and video retrieval by certain emotional categories or dimensions could play a central role for tomorrow's intelligent systems, enabling search for movies with a particular mood, computer aided scene and sound design in order to elicit(More)
In this paper we present a system for detecting unusual events in smart home environments. A primary application of this is to prolong independent living for elderly people at their homes. We show how to effectively combine information from multiple heterogeneous sensors which are typically present in a smart home scenario. Data fusion is done in a 3D voxel(More)