Blind Speech Signal Quality Estimation for Speaker Verification Systems

@inproceedings{Lavrentyeva2020BlindSS,
  title={Blind Speech Signal Quality Estimation for Speaker Verification Systems},
  author={Galina Lavrentyeva and Marina Volkova and Anastasia Avdeeva and Sergey Novoselov and Artem Gorlanov and Tseren Andzhukaev and Artem Ivanov and Alexander Kozlov},
  booktitle={INTERSPEECH},
  year={2020}
}
The problem of system performance degradation in mismatched acoustic conditions has been widely acknowledged in the community and is common for different fields. The present state-ofthe-art deep speaker embedding models are domain-sensitive. The main idea of the current research is to develop a single method for automatic signal quality estimation, which allows to evaluate short-term signal characteristics. This paper presents a neural network based approach for blind speech signal quality… 

Figures and Tables from this paper

Directional and Qualitative Feature Classification for Speaker Diarization with Dual Microphone Arrays
TLDR
A set of directional and qualitative features extracted from a dual microphone array signal are evaluated and it is shown that specific sets of features result in satisfying classification accuracy and can be further investigated in experiments combining them with biometric and other types of properties.
Investigation of Different Calibration Methods for Deep Speaker Embedding based Verification Systems
TLDR
This paper presents an investigation over several methods of score calibration: a classical approach based on the logistic regression model; the recently presented magnitude estimation network MagnetO that uses activations from the pooling layer of the trained deep speaker extractor and generalization of such approachbased on separate scale and offset prediction neural networks.
STC-Innovation Speaker Recognition Systems for Far-Field Speaker Verification Challenge 2020
TLDR
A number of experiments on x vector-based and ResNet-like architectures show that ResNetbased networks outperform x-vector-based systems.
Magnitude-aware Probabilistic Speaker Embeddings
TLDR
A new probabilistic speaker embedding extractor is proposed using the information encoded in the embedding magnitude and leverage it in the speaker verification pipeline and several quality-aware diarization methods are proposed and in-corporate the magnitudes in those.

References

SHOWING 1-10 OF 46 REFERENCES
STC Speaker Recognition Systems for the VOiCES From a Distance Challenge
TLDR
This work investigates different deep neural networks architectures for speaker embedding extraction to solve the task of speaker recognition in single channel distant/far-field audio under noisy conditions and shows that deep networks with residual frame level connections outperform more shallow architectures.
Deep Speaker Embeddings for Far-Field Speaker Recognition on Short Utterances
TLDR
This paper presents approaches aimed to improve the quality of far-field speaker verification systems in the presence of environmental noise, reverberation and reduce the system qualitydegradation for short utterances and confirms that ResNet architectures outperform the standard x-vector approach in terms of speaker verification quality.
Scores Calibration in Speaker Recognition Systems
TLDR
The effects of speech duration variability on the calibration when enroll and test speech utterances originate from the same channel are investigated and an effective method of scores stabilization is presented.
Estimation of Room Acoustic Parameters: The ACE Challenge
TLDR
The acoustic characterization of environments (ACE) challenge showed that T60 estimation is a mature field where analytical approaches dominate whilst DRR estimation is one of the less mature fields where machine learning approaches are currently more successful.
The ACE challenge — Corpus description and performance evaluation
TLDR
The Acoustic Characterization of Environments (ACE) Challenge is a competition to identify the most promising non-intrusive DRR and T60 estimation methods using real noisy reverberant speech.
The STC ASR System for the VOiCES from a Distance Challenge 2019
TLDR
The Speech Technology Center (STC) automatic speech recognition (ASR) system for the ”VOiCES from a Distance Challenge 2019” participated in the Fixed condition of the ASR task, which means that the only training data available was an 80-hour subset of the LibriSpeech corpus.
Perceptual Objective Listening Quality Assessment (POLQA), The Third Generation ITU-T Standard for End-to-End Speech Quality Measurement Part I-Temporal Alignment
TLDR
The authors present the Perceptual Objective Listening Quality Assessment (POLQA), the third-generation speech quality measurement algorithm, which provides a new measurement standard for predicting Mean Opinion Scores that outperforms the older PESQ standard.
Blind estimators for reverberation time and direct-to-reverberant energy ratio using subband speech decomposition
TLDR
Algorithms for estimating the reverberation time and direct-to-reverberant energy ratio are described, indicating the effectiveness of both techniques particularly in high-SNR situations.
Quality Measure Functions for Calibration of Speaker Recognition Systems in Various Duration Conditions
TLDR
Results highlight the importance of considering the quality metrics like duration in calibrating the scores for automatic speaker recognition systems and the need for a calibration approach to deal with these effects using quality measure functions (QMFs).
...
1
2
3
4
5
...