Speaker recognition using common passphrases in RedDots

@article{Aronowitz2017SpeakerRU,
  title={Speaker recognition using common passphrases in RedDots},
  author={Hagai Aronowitz},
  journal={2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  year={2017},
  pages={5405-5409}
}
  • Hagai Aronowitz
  • Published 1 March 2017
  • Computer Science
  • 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
In this paper we report our work on the recently collected text dependent speaker recognition dataset named RedDots, with a focus on the common passphrase condition. [] Key Method We then report several strategies to train on RedDots itself using up to 40 speakers for training. The GMM-NAP framework is used as a baseline. We report the following novelties: First, we demonstrate the use of bagging for improved accuracy. Second, we estimate the EER of a passphrase using metadata only. Third, the estimated EERs…

Figures and Tables from this paper

References

SHOWING 1-10 OF 15 REFERENCES
Exploiting supervector structure for speaker recognition trained on a small development set
TLDR
This work investigates the ability to build accurate speaker recognition systems using small amounts of data from the target domain without using out-of-domain data at all using the structural nature of GMM supervectors.
Speaker recognition using matched filters
  • Hagai Aronowitz
  • Computer Science
    2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2016
TLDR
It is shown how a matched filter can be optimized to maximize SNR (signal to noise ratio) when the noise component includes both intra-speaker variability and center/mean hyper-parameter variability.
Audio enhancing with DNN autoencoder for speaker recognition
TLDR
A DNN-based autoencoder for speech enhancement and its use for speaker recognition systems for distant microphones and noisy data is presented and a more detailed analysis on various conditions of NIST SRE 2010 and PRISM is presented suggesting that the proposed preprocessig is a promising and efficient way to build a robust speaker recognition system.
Speaker recognition in two-wire test sessions
TLDR
This paper directly performs the recognition on the nonsegmented (or imperfectly diarized) speech and proposes improved recognition techniques both in the frame domain and in the model domain that reduce error rate significantly.
Analysis of i-vector Length Normalization in Speaker Recognition Systems
TLDR
The proposed approach deals with the nonGaussian behavior of i-vectors by performing a simple length normalization, which allows the use of probabilistic models with Gaussian assumptions that yield equivalent performance to that of more complicated systems based on Heavy-Tailed assumptions.
Simple and efficient speaker comparison using approximate KL divergence
TLDR
A new approximate KL divergence distance extending earlier GMM parameter vector SVM kernels is used and a weighted nuisance projection method for channel compensation is applied, and a simple eigenvector method of training is presented.
New Developments in Voice Biometrics for User Authentication
TLDR
This work investigates the use of state-of-the-art text-independent and text-dependent speaker verification technology for user authentication and shows how to adapt techniques such as joint factor analysis (JFA), Gaussian mixture models with nuisance attribute projection (GMM-NAP), and hidden Markov models with NAP to obtain improved results for new authentication scenarios and environments.
Efficient score normalization for speaker recognition
TLDR
The importance of score normalization for speaker identification is demonstrated, and accuracy is improved considerably using various normalization techniques, including T-norm and Z-norm.
Power-Normalized Cepstral Coefficients (PNCC) for Robust Speech Recognition
  • Chanwoo KimR. Stern
  • Computer Science
    IEEE/ACM Transactions on Audio, Speech, and Language Processing
  • 2016
TLDR
Experimental results demonstrate that PNCC processing provides substantial improvements in recognition accuracy compared to MFCC and PLP processing for speech in the presence of various types of additive noise and in reverberant environments, with only slightly greater computational cost than conventional MFCC processing.
Bagging predictors
TLDR
Tests on real and simulated data sets using classification and regression trees and subset selection in linear regression show that bagging can give substantial gains in accuracy.
...
...