We aim at modeling the appearance of the lower face region to assist visual feature extraction for audiovisual speech processing applications. In this paper, we present a neural network based statistical appearance model of the lips which classifies pixels as belonging to the lips, skin, or inner mouth classes. This model requires labeled examples to be… (More)
In this paper, we describe audiovisual automatic speech recognition experiments carried using visual parameters extracted from " natural " images. Unlike many other experiments in the AV ASR field, these visual parameters are obtained without any hand-labeling phase and are naturally noisy, due to the extraction process. We evaluate our models with… (More)
In this paper, we first present the way we used speech bimo-dality to build our shape and appearance model for AV-ASR. We then show the classification results obtained using a hand-labelled and two automatically built global appearance models of the lip. Finally, we propose several measures for quality evaluation of lip location.
We present here a new freely available audiovisual speech database. Contrary to other existing corpora, the LIUM-AVS corpus was recorded in conditions we qualify as natural, which are, according to us, much closer to real application conditions than other databases. This database was recorded without artificial lighting using an analog camcorder in camera… (More)