Rainer Stiefelhagen

Learn More
Simultaneous tracking of multiple persons in real-world environments is an active research field and several approaches have been proposed, based on a variety of features and algorithms. Recently, there has been a growing interest in organizing systematic evaluations to compare the various techniques. Unfortunately, the lack of common metrics for measuring(More)
A user's focus of attention plays an important role in human-computer interaction applications, such as a ubiquitous computing environment and intelligent space, where the user's goal and intent have to be continuously monitored. We are interested in modeling people's focus of attention in a meeting situation. We propose to model participants' focus of(More)
In this paper, we present an approach for recognizing pointing gestures in the context of human–robot interaction. In order to obtain input features for gesture recognition, we perform visual tracking of head, hands and head orientation. Given the images provided by a calibrated stereo camera, color and disparity information are integrated into a(More)
In recent years, several authors have reported that spectral saliency detection methods provide state-of-the-art performance in predicting human gaze in images (see, e.g., [1 3]). We systematically integrate and evaluate quaternion DCTand FFT-based spectral saliency detection [3,4], weighted quaternion color space components [5], and the use of multiple(More)
We address the problem of person identification in TV series. We propose a unified learning framework for multi-class classification which incorporates labeled and unlabeled data, and constraints between pairs of features in the training. We apply the framework to train multinomial logistic regression classifiers for multi-class face recognition. The method(More)
We introduce the MovieQA dataset which aims to evaluate automatic story comprehension from both video and text. The dataset consists of 14,944 questions about 408 movies with high semantic diversity. The questions range from simpler "Who" did "What" to "Whom", to "Why" and "How" certain events occurred. Each question comes with a set of five possible(More)
In this paper, a local appearance based face recognition algorithm is proposed. In the proposed algorithm local information is extracted using block-based discrete cosine transform. Obtained local features are combined both at the feature level and at the decision level. The performance of the proposed algorithm is tested on the Yale and CMU PIE face(More)
Context-sensing for context-aware HCI challenges the traditional sensor fusion methods with dynamic sensor configuration and measurement requirements commensurate with human perception. The Dempster-Shafer theory of evidence has uncertainty management and inference mechanisms analogous to our human reasoning process. Our Sensor Fusion for Contextaware(More)
This paper presents an overview of our work on tracking focus of attention in meeting situations. We have developed a system capable of estimating participants' focus of attention from multiple cues. In our system we employ an omni-directional camera to simultaneously track the faces of participants sitting around a meeting table and use neural networks to(More)
In this paper we present our ongoing work in building technologies for natural multimodal human-robot interaction. We present our systems for spontaneous speech recognition, multimodal dialogue processing and visual perception of a user, which includes the recognition of pointing gestures as well as the recognition of a person's head orientation. Each of(More)