Cristian Canton-Ferrer

Learn More
This paper presents a novel view-independent approach to the recognition of human gestures of several people in low resolution sequences from multiple calibrated cameras. In contraposition with other multi-ocular gesture recognition systems based on generating a classification on a fusion of features coming from different views, our system performs a data(More)
This paper introduces a novel approach to the task of data association within the context of pedestrian tracking, by introducing a two-stage learning scheme to match pairs of detections. First, a Siamese convolutional neural network (CNN) is trained to learn descriptors encoding local spatio-temporal structures between the two input image patches,(More)
Detection of acoustic events (AED) that take place in a meeting-room environment becomes a difficult task when signals show a large proportion of temporal overlap of sounds, like in seminar-type data, where the acoustic events often occur simultaneously with speech. Whenever the event that produces the sound is related to a given position or movement, video(More)
Acoustic event detection (AED) aims at determining the identity of sounds and their temporal position in audio signals. When applied to spontaneously generated acoustic events, AED based only on audio information shows a large amount of errors, which are mostly due to temporal overlaps. Actually, temporal overlaps accounted for more than 70% of errors in(More)
This paper presents a new approach to the problem of simultaneous tracking of several people in low resolution sequences from multiple calibrated cameras. Spatial redundancy is exploited to generate a discrete 3D binary representation of the scene. A particle filtering scheme adapted to the incoming 3D discrete data is proposed. A volume likelihood function(More)
Reliable measures of person positions are needed for computational perception of human activities taking place in a smart-room environment. In this work, we present the Person Tracking systems developed at UPC for audio, video and audio-video modalities in the context of the EU funded CHIL project research activities. The aim of the designed systems, and(More)
Acoustic events produced in meeting environments may contain useful information for perceptually aware interfaces and multimodal behavior analysis. In this paper, a system to detect and recognize these events from a multimodal perspective is presented combining information from multiple cameras and microphones. First, spectral and temporal features are(More)
This paper presents two approaches to the problem of simultaneous tracking of several people in low resolution sequences from multiple calibrated cameras. Spatial redundancy is exploited to generate a discrete 3D binary representation of the foreground objects in the scene. Color information obtained from the zenithal view is added to this 3D information.(More)
This article presents a multimodal approach to head pose estimation of individuals in environments equipped with multiple cameras and microphones, such as SmartRooms or automatic video conferencing. Determining the individuals head orientation is the basis for many forms of more sophisticated interactions between humans and technical devices and can also be(More)
The detection of the acoustic events (AEs) that are naturally produced in a meeting room may help to describe the human and social activity that takes place in it. When applied to spontaneous recordings, the detection of AEs from only audio information shows a large amount of errors, which are mostly due to temporal overlapping of sounds. In this paper, a(More)