Using phone log-likelihood ratios as features for speaker recognition

Abstract

The so called Phone Log-Likelihood Ratio (PLLR) features, computed on phone posterior probabilities provided by phonetic decoders, convey acoustic-phonetic information in a sequence of frame-level vectors. Thus, PLLRs can be easily plugged into traditional acoustic systems just by replacing MFCCs, PLPs or whatever other representation. PLLR features were used under an iVector-PLDA approach in our submission to the NIST 2012 Speaker Recognition Evaluation (SRE). In this work, we present a report of the goodness of these features for speaker recognition. Results on the telephone clean speech condition of the NIST 2010 and 2012 SRE show that, although the system based on PLLR features does not reach state-ofthe-art performance, including it in a fusion with a traditional acoustic based system (trained on MFCC features) provides remarkable gains in performance (among the best reported in the NIST 2012 SRE telephone without added noise condition), revealing a fruitful way of using acoustic-phonetic information for speaker recognition.

Extracted Key Phrases

4 Figures and Tables

Cite this paper

@inproceedings{Dez2013UsingPL, title={Using phone log-likelihood ratios as features for speaker recognition}, author={Mireia D{\'i}ez and Amparo Varona and Mikel Pe{\~n}agarikano and Luis Javier Rodr{\'i}guez-Fuentes and Germ{\'a}n Bordel}, booktitle={INTERSPEECH}, year={2013} }