UR Channel-Robust Synthetic Speech Detection System for ASVspoof 2021

  title={UR Channel-Robust Synthetic Speech Detection System for ASVspoof 2021},
  author={Xinhui Chen and You Zhang and Ge Zhu and Zhiyao Duan},
In this paper, we present UR-AIR system submission to the logical access (LA) and the speech deepfake (DF) tracks of the ASVspoof 2021 Challenge. The LA and DF tasks focus on synthetic speech detection (SSD), i.e. detecting text-to-speech and voice conversion as spoofing attacks. Different from previous ASVspoof challenges, the LA task this year presents codec and transmission channel variability, while the new task DF presents general audio compression. Built upon our previous research work on… 

Figures and Tables from this paper

The Vicomtech Audio Deepfake Detection System Based on Wav2vec2 for the 2022 ADD Challenge

This approach is based on the combination of a pre-trained wav2vec2 feature extractor and a downstream classifier to detect spoofed audio, which exploits the contextualized speech representations at the different transformer layers to fully capture discriminative information.

Synthetic speech detection using meta-learning with prototypical loss

This work addresses the generalizability of spoofing detection by proposing prototypical loss under the meta-learning paradigm to mimic the unseen test scenario during training and demonstrates that the proposed single system without any data augmentation can achieve competitive performance to the recent best anti-spoofing systems on ASVspoof 2019 logical access (LA) task.

Rawboost: A Raw Data Boosting and Augmentation Method Applied to Automatic Speaker Verification Anti-Spoofing

Experiments show that RawBoost improves the performance of a state-of-the-art raw end-to-end baseline system by 27% relative and is only outperformed by solutions that either depend on external data or that require additional intervention at the model level.

A Probabilistic Fusion Framework for Spoofing Aware Speaker Verification

A probabilistic framework for fusing the ASV and CM subsystem scores is built and fusion strategies for direct inference and tuning to predict the SASV score are proposed based on the framework.

Automatic speaker verification spoofing and deepfake detection using wav2vec 2.0 and data augmentation

This paper reports on efforts to use self-supervised learning in the form of a wav2vec 2.0 front-end with fine tuning to obtain the lowest equal error rates reported in the literature for both the ASVspoof 2021 Logical Access and Deepfake databases.

A New Fusion Strategy for Spoofing Aware Speaker Verification

A score scaling and multiplication strategy for inference and an SASV training strategy that significantly improves the SASV equal error rate (EER) from 19.31% of the best baseline to 1.58% on the official evaluation trials of the SASv challenge.

A Comparative Study of Fusion Methods for SASV Challenge 2022

This paper describes the research of other fusion methods, including boosting over embeddings, which has not been used in anti-spoofing studies before, and a fusion overembeddings or scores obtained from ASV and CM models.

BPCNN: Bi-Point Input for Convolutional Neural Networks in Speaker Spoofing Detection

A method for convolutional neural networks that handle variable-length input features, called bi-point input, that reduces the relative equal error rate (EER) by approximately 17.2% and 43.8% on average for the logical access (LA) and physical access (PA) tasks, respectively.

Adversarial Speaker Distillation for Countermeasure Model on Automatic Speaker Verification

The proposed adversarial speaker distillation ResNetSE (ASD-ResNetSE) model is an improved version of knowledge distillation method combined with generalized end-to-end (GE2E) pre-training and adversarialtuning that reaches 0.2695 min t-DCF and 3.54% EER in the evaluation phase of the ASVspoof 2021 Logical Access task.

Explainable deepfake and spoofing detection: an attack analysis using SHapley Additive exPlanations

It is shown that visualisations of SHAP results can be used to identify attack-specific artefacts and the differences and consistencies between synthetic speech and converted voice spoo fing attacks.



Generalization of Audio Deepfake Detection

This paper focuses on overcoming the issue of generalization ability of spoofing countermeasures by using large margin cosine loss function (LMCL) and online frequency masking augmentation to force the neural network to learn more robust feature embeddings.

Replay and Synthetic Speech Detection with Res2Net Architecture

  • Xu LiN. Li H. Meng
  • Computer Science
    ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2021
The Res2Net model consistently outperforms ResNet34 and ResNet50 by a large margin in both physical access (PA) and logical access (LA) of the ASVspoof 2019 corpus and the constant-Q transform (CQT) achieves the most promising performance in both PA and LA scenarios.

An Empirical Study on Channel Effects for Synthetic Voice Spoofing Countermeasure Systems

A cross-dataset study on several state-of-the-art CM systems is conducted and it is hypothesized that channel mismatch among these datasets is one important reason for performance degradation, and several channel robust strategies are proposed.

A Large-Scale Open-Source Acoustic Simulator for Speaker Recognition

While error rates increase considerably under degraded speech conditions, large relative equal error rate (EER) reductions were observed when using a PLDA model trained with a large number of degraded sessions per speaker.

ASVspoof 2019: Future Horizons in Spoofed and Fake Audio Detection

The 2019 database, protocols and challenge results are described, and major findings which demonstrate the real progress made in protecting against the threat of spoofing and fake audio are outlined.

One-Class Learning Towards Synthetic Voice Spoofing Detection

This work proposes an anti-spoofing system to detect unknown synthetic voice spoofing attacks (i.e., text-to-speech or voice conversion) using one-class learning, which achieves an equal error rate (EER) and outperforming all existing single systems.

A Comparative Study on Recent Neural Spoofing Countermeasures for Synthetic Speech Detection

A comparison of countermeasure models on the ASVspoof 2019 logical access scenario takes into account common strategies to deal with input trials of varied length, recently proposed marginbased training criteria, and widely used front ends.

ASVspoof 2019: A large-scale public database of synthesized, converted and replayed speech

Advances in anti-spoofing: from the perspective of ASVspoof challenges

The literature review of ASV spoof detection, novel acoustic feature representations, deep learning, end-to-end systems, etc, along with recent efforts to develop countermeasures for spoof speech detection (SSD) task are presented.

ECAPA-TDNN: Emphasized Channel Attention, Propagation and Aggregation in TDNN Based Speaker Verification

The proposed ECAPA-TDNN architecture significantly outperforms state-of-the-art TDNN based systems on the Voxceleb test sets and the 2019 VoxCeleb Speaker Recognition Challenge.