Speaker Verification in Mismatched Conditions with Frustratingly Easy Domain Adaptation

@inproceedings{Alam2018SpeakerVI,
  title={Speaker Verification in Mismatched Conditions with Frustratingly Easy Domain Adaptation},
  author={Md. Jahangir Alam and Gautam Bhattacharya and Patrick Kenny},
  booktitle={Odyssey},
  year={2018}
}
The 2016 edition of the NIST speaker recognition evaluation tests the ability of speaker verification systems to deal with domain mismatch between development and test data. In order to adapt to new languages, a small amount of unlabeled, in-domain data was provided warranting the need for an unsupervised approach to learn from this data. In this work we adapt a simple domain adaptation strategy to the speaker verification problem. We test our approach using two types of speaker embeddings i… Expand
Speaker Verification Using End-to-end Adversarial Language Adaptation
TLDR
This paper examines several configurations, such as the use of (pseudo-)labels on the target domain as well as domain labels in the feature extractor, and demonstrates the effectiveness of the adversarial adaptation method on the challenging NIST SRE16 and SRE18 benchmarks. Expand
A Framework for Adapting DNN Speaker Embedding Across Languages
TLDR
A maximum mean discrepancy (MMD) based framework for adapting deep neural network (DNN) speaker embedding across languages, featuring multi-level domain loss, separate batch normalization, and consistency regularization is proposed, and it is shown that minimizing domain discrepancy at both frame- and utterance-levels performs significantly better than at utterance level alone. Expand
VAE-based Domain Adaptation for Speaker Verification
  • Xueyi Wang, Lantian Li, Dong Wang
  • Computer Science, Engineering
  • 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)
  • 2019
TLDR
A domain adaptation approach based on Variational Auto-Encoder (VAE) that transforms x-vectors to a regularized latent space; within this latent space, a small amount of data from the target domain is sufficient to accomplish the adaptation. Expand
Investigating Domain Sensitivity of DNN Embeddings for Speaker Recognition Systems
TLDR
The results show that domain mismatch can be compensated effectively using autoencoders to adapt the out-domain data to in-domain and two novel deep domain adaptation techniques based on autoencoder architectures trained on embeddings in an unsupervised fashion. Expand
The NEC-TT 2018 Speaker Verification System Kong
This paper describes the NEC-TT speaker verification system for the 2018 NIST speaker recognition evaluation (SRE’18). We present the details of data partitioning, x-vector speaker embedding, dataExpand
The DKU-SMIIP System for NIST 2018 Speaker Recognition Evaluation
In this paper, we present the system submission for the NIST 2018 Speaker Recognition Evaluation by DKU Speech and Multi-Modal Intelligent Information Processing (SMIIP) Lab. We explore various kindsExpand
The TalTech Systems for the Short-Duration Speaker Verification Challenge 2020
TLDR
This paper presents the Tallinn University of Technology systems submitted to the Short-duration Speaker Verification Challenge 2020, focusing on text-dependent and text-independent speaker verification with some cross-lingual aspects. Expand
Deep Speaker Embeddings for Far-Field Speaker Recognition on Short Utterances
TLDR
This paper presents approaches aimed to improve the quality of far-field speaker verification systems in the presence of environmental noise, reverberation and reduce the system qualitydegradation for short utterances and confirms that ResNet architectures outperform the standard x-vector approach in terms of speaker verification quality. Expand
Semi-supervised Nuisance-attribute Networks for Domain Adaptation
TLDR
Using SNANs as a preprocessing step for PLDA training, this paper achieves a relative improvement of 11.8% in EER on NIST 2016 SRE compared to PLDA without adaptation and finds that the semi-supervised approach can further improve SNAN’s’ performance. Expand
NEC-TT System for Mixed-Bandwidth and Multi-Domain Speaker Recognition
TLDR
A detailed description and analysis of the design methodology, data augmentation, bandwidth extension, multi-head attention, PLDA adaptation, and other components that have contributed to good performance in NEC-TT’s SRE’18 results are provided. Expand
...
1
2
3
4
...

References

SHOWING 1-10 OF 21 REFERENCES
Speaker Verification Under Adverse Conditions Using i-Vector Adaptation and Neural Networks
TLDR
This paper applies minimum divergence training to adapt a conventional i-vector extractor to the task domain to tackle the domain mismatch problem and proposes a new Beta-Bernoulli backend that models the features supplied by the speaker classifier network. Expand
Deep Speaker Embeddings for Short-Duration Speaker Verification
TLDR
This work proposes to use deep neural networks to learn short-duration speaker embeddings based on a deep convolutional architecture wherein recordings are treated as images and advocates treating utterances as images or ‘speaker snapshots, much like in face recognition. Expand
X-Vectors: Robust DNN Embeddings for Speaker Recognition
TLDR
This paper uses data augmentation, consisting of added noise and reverberation, as an inexpensive method to multiply the amount of training data and improve robustness of deep neural network embeddings for speaker recognition. Expand
End-to-End Text-Independent Speaker Verification with Triplet Loss on Short Utterances
TLDR
An end-to-end system which directly learns a mapping from speech features to a compact fixed length speaker discriminative embedding where the Euclidean distance is employed for measuring similarity within trials. Expand
Deep neural network-based speaker embeddings for end-to-end speaker verification
TLDR
It is shown that given a large number of training speakers, the proposed system outperforms an i-vector baseline in equal error-rate (EER) and at low miss rates. Expand
Deep Neural Network Embeddings for Text-Independent Speaker Verification
TLDR
It is found that the embeddings outperform i-vectors for short speech segments and are competitive on long duration test conditions, which are the best results reported for speaker-discriminative neural networks when trained and tested on publicly available corpora. Expand
Text-dependent speaker verification: Classifiers, databases and RSR2015
TLDR
The HiLAM system, based on a three layer acoustic architecture, and an i-vector/PLDA system, outperforms the state-of-the-art i- vector system in most of the scenarios and provides a reference evaluation scheme and a reference performance on RSR2015 database to the research community. Expand
Front-End Factor Analysis For Speaker Verification
  • Florin Curelaru
  • Computer Science
  • 2018 International Conference on Communications (COMM)
  • 2018
TLDR
This paper investigates which configuration and which parameters lead to the best performance of an i-vectors/PLDA based speaker verification system and presents at the end some preliminary experiments in which the utterances comprised in the CSTR VCTK corpus were used besides utterances from MIT-MDSVC for training the total variability covariance matrix and the underlying PLDA matrices. Expand
Deep neural networks for small footprint text-dependent speaker verification
TLDR
Experimental results show the DNN based speaker verification system achieves good performance compared to a popular i-vector system on a small footprint text-dependent speaker verification task and is more robust to additive noise and outperforms the i- vector system at low False Rejection operating points. Expand
Deep Speaker: an End-to-End Neural Speaker Embedding System
TLDR
Results that suggest adapting from a model trained with Mandarin can improve accuracy for English speaker recognition are presented, and it is suggested that Deep Speaker outperforms a DNN-based i-vector baseline. Expand
...
1
2
3
...