Jordi Robert-Ribes

Learn More
The efficacy of audio-visual interactions in speech perception comes from two kinds of factors. First, at the information level, there is some "complementarity" of audition and vision: It seems that some speech features, mainly concerned with manner of articulation, are best transmitted by the audio channel, while some other features, mostly describing(More)
We present a prototype that enables the generation of hyperlinks between audio and the corresponding transcript. The main issue in generating such hyperlinks is determining common time points in the transcript and the audio (this is also called aligning). The system is speaker independent and can deal with inexact transcripts. It combines inaccurate modules(More)
This document has been prepared in the ESPRIT BRA No. 8579, Multimodal Integration for Advanced Multimedia Interfaces | in the following referred to as MIAMI | in order to serve as a basis for future work. The basic terms which will be used in MIAMI will be de ned and an overview on man-machine-interfaces will be given. The term \taxonomy" is used in the(More)
Though a large amount of psychological and physiological evidence of audio-visual integration in speech has been collected in the last 20 years, there is no agreement about the nature of the fusion process. We present the main experimental data, and describe the various models proposed in the literature, together with a number of studies in the field of(More)
We present a novel algorithm for the robust and reliable automatic extraction of lip feature points for speechreading. The algorithm uses a combination of colour information in the image data and knowledge about the structure of the mouth area to find certain feature points on the inner lip contour. A new confidence measure quantifying how well the feature(More)
We have recently proposed a new algorithm for the automatic extraction of lip feature points. Based on their positions, parameters describing the shape of the mouth are derived. Since the algorithm is based on a stereo vision face tracking system, all measurements are in real-world distances. In this paper, we evaluate the accuracy of the automatic feature(More)
Closed captioning dramatically improves deaf people’s enjoyment of television shows, and appears to augment the auditory signal for people with some degree of hearing impairment. However, reports from people with mild to severe hearing loss suggest that when there is a delay between the audio track and the caption, perceivers are confused unless they turn(More)
Speech processing can be of great help for indexing and archiving TV broadcast material. Broadcasting station standards will be soon digital. There will be a huge increase in the use of speech processing techniques for maintaining the archives as well as accessing them. This paper starts with a review of several techniques used for classification of speech,(More)