Samuel Pachoud

Learn More
In this paper, we present a spatio-temporal feature representation and a probabilistic matching function to recognise lip movements from pronounced digits. Our model (1) automatically selects spatio-temporal features extracted from 10 digit model templates and (2) matches them with probe video sequences. Spatio-temporal features embed lip movements from(More)
We extract relevant and informative audiovisual features using multiple multi-class Support Vector Machines with probabilistic outputs, and demonstrate the approach in a noisy audiovisual speech reading scenario. We first extract visual spatio-temporal features and audio cepstral coefficients from pronounced digit sequences. Two classifiers are then trained(More)
For the recognition of speech, in particular spoken digits, captured in video with poor sound due to noise, we develop a novel audiovisual fusion technique that performs significantly better than utilising either audio or video signal alone. Specifically, we present an audiovisual intermediate fusion strategy to locate speaker dependant pronounced digits in(More)
  • 1