An on-line incremental speaker adaptation technique for audio stream transcription

Abstract

In this paper, a novel on-line incremental speaker adaptation technique is proposed for real time transcription applications such as automatic closed-captioning of live TV programs. Differently from previously proposed methods, our technique does not operate at utterance level but instead speaker change detection and clustering as well as speaker adaptation occur over a short chunk of the incoming audio signal. Incremental adaptation based on feature space maximum likelihood linear regression (fMLLR) is conducted w. r. t. a Gaussian mixture model (GMM) modeling the acoustic training data. Individual speakers are represented by fMLLR transforms, and these transforms are used for speaker clustering and for performing speaker adaptation. Speech recognition experiments show that the proposed incremental adaptation technique is effective, 6% relative reduction in word-error-rate (WER) w. r. t. a non-adaptive baseline system, when it is embedded in a online transcription system applied to transcribe television news broadcasts.

Extracted Key Phrases

3 Figures and Tables

Cite this paper

@inproceedings{Giuliani2013AnOI, title={An on-line incremental speaker adaptation technique for audio stream transcription}, author={Diego Giuliani and Fabio Brugnara}, booktitle={INTERSPEECH}, year={2013} }