Compensate multiple distortions for speaker recognition systems

  title={Compensate multiple distortions for speaker recognition systems},
  author={Mohammad MohammadAmini and Driss Matrouf and Jean-François Bonastre and Romain Serizel and Sandipana Dowerah and Denis and Jouvet},
  journal={2021 29th European Signal Processing Conference (EUSIPCO)},
The performance of speaker recognition systems reduces dramatically in severe conditions in the presence of additive noise and/or reverberation. In some cases, there is only one kind of domain mismatch like additive noise or reverberation, but in many cases, there are more than one distortion. Finding a solution for domain adaptation in the presence of different distortions is a challenge. In this paper we investigate the situation in which there is none, one or more of the following… 

Figures and Tables from this paper

Learning Noise Robust ResNet-Based Speaker Embedding for Speaker Recognition

Two new variants of ResNet-based speaker recognition systems are proposed that make the speaker embedding more robust against additive noise and reverberation and extract x-vectors in noisy environments that are close to their corresponding x-vector in a clean environment.

A Comprehensive Exploration of Noise Robustness and Noise Compensation in ResNet and TDNN-based Speaker Recognition Systems

In most cases the performance of ResNet without compensation is superior to TDNN with noise compensation, and in all cases the ResNet system is more robust than TDNN.

Barlow Twins self-supervised learning for robust speaker recognition

In the proposed system, the Barlow Twins objective function is calculated in the embedding layer and it is optimized jointly with the speaker classifier loss function, integrated with the ResNet-based speaker embedding system.



Robust Speaker Identification in Noisy and Reverberant Conditions

A robust SID with speaker models trained in selected reverberant conditions is performed, on the basis of bounded marginalization and direct masking, which substantially improves SID performance over related systems in a wide range of reverberation time and signal-to-noise ratios.

Analysis of DNN Speech Signal Enhancement for Robust Speaker Recognition

Data augmentation versus noise compensation for x-vector speaker recognition systems in noisy environments

This work wants to know if explicit noise compensation techniques continue to be effective despite the general noise robustness of these systems, and proposes to add a denoising x-vector subsystem before scoring.

Denoising x-vectors for Robust Speaker Recognition

This paper tries to denoise the x-vectors speaker embedding by leveraging denoising autoencoders (DAE) and proposing a novel DAE architecture, named Deep Stacked DAE, composed of several DAEs where each DAE receives as input the output of its predecessor DAE concatenated with the difference between noisy x-vesctors and its predecessor's output.

Deep Learning Based Multi-Channel Speaker Recognition in Noisy and Reverberant Environments

It is shown that rank-1 approximation of a speech covariance matrix based on generalized eigenvalue decomposition leads to the best results for the masking-based MVDR beamformer.

Probabilistic Approach Using Joint Clean and Noisy i-Vectors Modeling for Speaker Recognition

A new ”data-driven” denoising technique operating in the i-vector space based on a joint modeling of clean and noisy i-vectors that achieves up to 80% of relative improvement in EER and can be used to compensate multiple ”unseen” noises.

On the use of X-vectors for Robust Speaker Recognition

This work presents an analysis of a SV system based on DNN embeddings (x-vectors) and confirms the robustness of such systems across multiple data domains, both in clean, noisy and reverberant environments.

Front-end speech enhancement for commercial speaker verification systems

Distant-talking speaker identification using a reverberation model with various artificial room impulse responses

A distant-talking speaker recognition method using a reverberation model with various artificial room impulse responses with different speaker and microphone positions, room sizes, and reflection coefficients of walls and convoluted with clean speech is proposed.

Joint Optimization of Neural Acoustic Beamforming and Dereverberation with x-Vectors for Robust Speaker Verification

Experiments show that jointly training the supportive neural network models along with the x-vector network within the classical speech enhancement framework brings significant performance gain for robust text-independent (TI) SV.