Content-aware local variability vector for speaker verification with short utterance


I-vector has shown to be very effective in speaker verification with long-duration speech utterances. But when test utterances are of short duration, content mismatch between the enrollment and test utterances limit the performance of i-vector system. This paper proposes to extract local session variability vectors on different phonetic classes from the utterances instead of estimating the session variability across the whole utterance as i-vector does. Using the posteriors given by a deep neural network (DNN) trained for phone state classification, the local vectors represent the session variability contained in specific phonetic content. Our experiments show that the content-aware local vectors are better at coping with the content mismatch between training and test utterances of short durations for text-independent, text-constrained and text-dependent tasks.

DOI: 10.1109/ICASSP.2016.7472726

5 Figures and Tables

Cite this paper

@article{Chen2016ContentawareLV, title={Content-aware local variability vector for speaker verification with short utterance}, author={Liping Chen and Kong-Aik Lee and Chng Eng Siong and Bin Ma and Haizhou Li and Li-Rong Dai}, journal={2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)}, year={2016}, pages={5485-5489} }