A Denoising Autoencoder for Speaker Recognition. Results on the MCE 2018 Challenge

@article{Font2019ADA,
  title={A Denoising Autoencoder for Speaker Recognition. Results on the MCE 2018 Challenge},
  author={Roberto Font},
  journal={ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  year={2019},
  pages={6016-6020}
}
  • R. Font
  • Published 12 May 2019
  • Computer Science
  • ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
We propose a Denoising Autoencoder (DAE) for speaker recognition, trained to map each individual ivector to the mean of all ivectors belonging to that particular speaker. The aim of this DAE is to compensate for inter-session variability and increase the discriminative power of the ivectors prior to PLDA scoring. We test the proposed approach on the MCE 2018 1st Multi-target speaker detection and identification Challenge Evaluation. This evaluation presents a call-center fraud detection… 

Figures and Tables from this paper

MCE 2018: The 1st Multi-target Speaker Detection and Identification Challenge Evaluation (MCE) Plan, Dataset and Baseline System

The Multitarget Challenge aims to assess how well current speech technology is able to determine whether or not a recorded utterance was spoken by one of a large number of 'blacklisted' speakers. It

Latent Space Representation for Multi-Target Speaker Detection and Identification with a Sparse Dataset Using Triplet Neural Networks

A neural network approach to built a latent space for different classifiers to solve the Multi-Target Speaker Detection and Identification Challenge Evaluation 2018 (MCE 2018) dataset, and shows that the representational power of TNNs is especially evident when training on small datasets with few instances available per class.

On Open-Set Speaker Identification with I-Vectors

The proposed system consists of an outlier detector in combination with a classical closed-set speaker identification chain and utilizes an effective preprocessing technique for i-vectors, called linear alignment, which is justified both theoretically and experimentally by comparing multiple outlier detectors.

On Open-Set Classification with L3-Net Embeddings for Machine Listening Applications

  • Kevin Wilkinghoff
  • Computer Science
    2020 28th European Signal Processing Conference (EUSIPCO)
  • 2021
A neural network that combines all L3-Net embeddings belonging to one recording into a single vector by using an x-vector mechanism as well as an open-set classification system based on that are presented.

Zlivanje bioloških podatkov z uporabo večmodalnih nevronskih mrež in razcepa matrik

Vsako leto se na podrocju bioinformatike izvede na stotine novih raziskav. Rezultati le teh so razdrobljeni po razlicnih podatkovnih bazah, ki so med seboj nepovezane, ali pa sploh niso dostopne

References

SHOWING 1-10 OF 18 REFERENCES

On autoencoders in the i-vector space for speaker recognition

The aim of this investigation is to study the properties of DAE in the i-vector space and analyze different strategies of initialization and training of the back-end parameters and propose several improvements to the system to increase the accuracy.

MCE 2018: The 1st Multi-target Speaker Detection and Identification Challenge Evaluation (MCE) Plan, Dataset and Baseline System

The Multitarget Challenge aims to assess how well current speech technology is able to determine whether or not a recorded utterance was spoken by one of a large number of 'blacklisted' speakers. It

A novel scheme for speaker recognition using a phonetically-aware deep neural network

We propose a novel framework for speaker recognition in which extraction of sufficient statistics for the state-of-the-art i-vector model is driven by a deep neural network (DNN) trained for

Autoencoder Based Domain Adaptation for Speaker Recognition Under Insufficient Channel Information

The proposed approach combines an autoencoder with a denoising autoen coder to adapt resource-rich development dataset to test domain to exploit limited in-domain dataset effectively and shows significant improvements over baselines and results from other prior studies.

X-Vectors: Robust DNN Embeddings for Speaker Recognition

This paper uses data augmentation, consisting of added noise and reverberation, as an inexpensive method to multiply the amount of training data and improve robustness of deep neural network embeddings for speaker recognition.

i-Vector Transformation Using a Novel Discriminative Denoising Autoencoder for Noise-Robust Speaker Recognition

i-vector transformations using neural networks for achieving noise-robust speaker recognition shows 32% better error performance as compared to a baseline system and outperforms such conventional methods as multi-condition training and a basic denoising autoencoder.

Front-End Factor Analysis For Speaker Verification

  • Florin Curelaru
  • Computer Science
    2018 International Conference on Communications (COMM)
  • 2018
This paper investigates which configuration and which parameters lead to the best performance of an i-vectors/PLDA based speaker verification system and presents at the end some preliminary experiments in which the utterances comprised in the CSTR VCTK corpus were used besides utterances from MIT-MDSVC for training the total variability covariance matrix and the underlying PLDA matrices.

Deep Neural Network Embeddings for Text-Independent Speaker Verification

It is found that the embeddings outperform i-vectors for short speech segments and are competitive on long duration test conditions, which are the best results reported for speaker-discriminative neural networks when trained and tested on publicly available corpora.

Deep Neural Networks for extracting Baum-Welch statistics for Speaker Recognition

Although the proposed i-vectors yield inferior performance compared to the standard ones, they are capable of attaining 16% relative improvement when fused with them, meaning that they carry useful complementary information about the speaker’s identity.

Analysis of Score Normalization in Multilingual Speaker Recognition

The analysis shows that the adaptive score normalization (using top scoring files per trial) selects cohorts that in 68% contain recordings from the same language and in 92% of the same gender as the enrollment and test recordings.