Challenges in the Fusion of Video and Audio for Robust Speech Recognition


As speech recognizers become more robust, they are popularly accepted as an essential component of human-computer interaction. State-ofthe-art speaker-independent speech recognizers exist with word recognition error rates below 10%. To achieve even higher and robust recognition performance, multi-modal speech recognition techniques that combine video and… (More)


