I-vector based deep neural network acoustic model adaptation using multilingual language resource

Abstract

I-vector adaptation of DNN-HMM acoustic models has shown clear performance improvement for speech recognition. In this paper, we study this technique on Babel task. we use Swahili as target language (training data of 50 hours) and another 6 languages as multilingual resources to train i-vector extractors respectively. Our study shows that i-vector extractors trained with more multilingual data only produce slightly improved results. Moreover, we compared two i-vectors adaptation methods, 1) concatenate i-vectors with spectral features; 2) predict a bias term adding it to spectral features from i-vectors using a NN. When DNN is trained from scratch, the two methods perform similarly. However, only the second method is appropriate in a cross-lingual transfer learning scenario. We investigate it as well, and results show further word error rate reduction can be gained.

DOI: 10.1109/APSIPA.2016.7820698

7 Figures and Tables

Cite this paper

@article{Xu2016IvectorBD, title={I-vector based deep neural network acoustic model adaptation using multilingual language resource}, author={Haihua Xu and Wei Rao and Xiong Xiao and Hao Huang and Chng Eng Siong and Haizhou Li}, journal={2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)}, year={2016}, pages={1-5} }