Exploiting a 'gaze-Lombard effect' to improve ASR performance in acoustically noisy settings

Abstract

Previous use of gaze (eye movement) to improve ASR performance involves shifting language model probability mass towards the subset of the vocabulary whose words are related to a person’s visual attention. Motivated to improve Automatic Speech Recognition (ASR) performance in acoustically noisy settings by using information from gaze selectively, we propose a ‘Selective Gaze-contingent ASR’ (SGC-ASR). In modelling the relationship between gaze and speech conditioned on noise level a ‘gaze-Lombard effect’ simultaneous dynamic adaptation of acoustic models and the language model is achieved. Evaluation on a matched set of gaze and speech data recorded under a varying speech babble noise condition yields WER performance improvements. The work highlights the use of gaze information in dynamic model-based adaptation methods for noise robust ASR.

DOI: 10.1109/ICASSP.2014.6853899

Extracted Key Phrases

5 Figures and Tables

Cite this paper

@inproceedings{Cooke2014ExploitingA, title={Exploiting a 'gaze-Lombard effect' to improve ASR performance in acoustically noisy settings}, author={Neil Cooke and Ao Shen and Martin J. Russell}, booktitle={ICASSP}, year={2014} }