Jacek C. Wojdel

Learn More
This paper describes the gathering and availability of an audio-visual speech corpus for Dutch language. The corpus was prepared with the multi-modal speech recognition in mind and it is currently used in our research on lip-reading and bimodal speech recognition. It contains the prompts used also in the well established POLYPHONE corpus and therefore(More)
In this paper we present the lip-reading experiments with different sets of the features extracted from the video sequence. In our experiments we use a simple color based filtering techniques to extract the feature vectors from the incoming video signal. Some of those features are directly related to the geometrical properties of the lips (their position(More)
The current audio-only speech recognition still lacks the expected robustness when the Signal to Noise Ratio (SNR) decreases. The video information is not affected by noise which makes it an ideal candidate for data fusion for speech recognition benefit. In the paper [1] the authors have shown that most of the techniques used for extraction of static visual(More)
In this paper we present a novel way of processing the video signal for lipreading application, and a post-processing data transformation that can be used alongside it to improve the audiovisual speech recognition results. The presented Lip Geometry Estimation (LGE) is compared with other geometryand image intensitybased techniques typically deployed for(More)
This paper describes the development of a large vocabulary speaker independent speech recognizer for the Dutch language. The recognizer was build using Hidden Markov Toolkit and the Polyphone database of recorded Dutch speech. A number of systems have been build ranging from a simple monophone recognizer to a sophisticated system that uses backed-off(More)
This paper presents our experiments on continuous audiovisual speech recognition. A number of bimodal systems using feature fusion or fusion within Hidden Markov Models are implemented. Experiments with different fusion techniques and their results are presented. Further the performance levels of the bimodal system and a unimodal speech recognizer under(More)
Non-verbal communication plays an important role in human communication. At the Delft University of Technology there is a project running on the automatic recognition of facial expressions. The developed system ISFER (Integrated System for Facial Expression Recognition) consists of modules suited for the analysis of a frontal view of the face. As the(More)
In this paper we present how to implement the cooccurrence rules defined by psychologist Paul Ekman in a computer animated face. The rules describe the dependencies between the atomic observable movements of the human face (so called Action Units). They are defined in a form suitable for a human observer who needs to produce a consistent binary scoring of(More)