An explainability study of the constant Q cepstral coefficient spoofing countermeasure for automatic speaker verification

  title={An explainability study of the constant Q cepstral coefficient spoofing countermeasure for automatic speaker verification},
  author={Hemlata Tak and Jose Patino and Andreas Nautsch and Nicholas W. D. Evans and Massimiliano Todisco},
Anti-spoofing for automatic speaker verification is now a well established area of research, with three competitive challenges having been held in the last 6 years. A great deal of research effort over this time has been invested into the development of front-end representations tailored to the spoofing detection task. One such approach known as constant Q cepstral coefficients (CQCCs) have been shown to be especially effective in detecting attacks implemented with a unit selection based speech… 

Figures and Tables from this paper

Representation Selective Self-distillation and wav2vec 2.0 Feature Exploration for Spoof-aware Speaker Verification
This study examines which feature space can effectively represent synthetic artifacts using wav2vec 2.0, and study which architecture can effectively utilize the space, and proposes a simple yet effective spoofing aware speaker verification (SASV) methodology which takes advantage of the disentangled representations from the countermeasure system.
A Comparative Study on Recent Neural Spoofing Countermeasures for Synthetic Speech Detection
A comparison of countermeasure models on the ASVspoof 2019 logical access scenario takes into account common strategies to deal with input trials of varied length, recently proposed marginbased training criteria, and widely used front ends.
A multi-branch ResNet with discriminative features for detection of replay speech signals
This work proposes a CQT-based modified group delay feature (CQTMGD) which can capture the phase information ofCQT, and a multi-branch residual convolution network, ResNeWt, is proposed to distinguish replay attacks from bonafide attempts.
Investigating self-supervised front ends for speech spoofing countermeasures
  • Xin Wang, J. Yamagishi
  • Computer Science
    The Speaker and Language Recognition Workshop (Odyssey 2022)
  • 2022
This study uses pre-trained selfsupervised speech models as the front end of spoofing CMs, and finds that, when a good pre- trained front end was fine tuned with either a shallow or a deep neural-network-based back end on the ASVspoof 2019 logical access (LA) training set, the resulting CM not only achieved a low EER score on the 2019 LA test set but also significantly outperformed the baseline.
End-to-End Spectro-Temporal Graph Attention Networks for Speaker Verification Anti-Spoofing and Speech Deepfake Detection
It is shown that better performance can be achieved when the fusion is performed within the model itself and when the representation is learned automatically from raw waveform inputs.
End-to-End anti-spoofing with RawNet2
Modifications made to the original RawNet2 architecture are described so that it can be applied to anti-spoofing and these results are reproducible with open source software.
A Study On Data Augmentation In Voice Anti-Spoofing
Spoofing Attack Detection using the Non-linear Fusion of Sub-band Classifiers
This work shows that a bank of very simple classifiers, each with a front-end tuned to the detection of different spoofing attacks and combined at the score level through non-linear fusion, can deliver superior performance than more sophisticated ensemble solutions that rely upon complex neural network architectures.
Graph Attention Networks for Anti-Spoofing
Experiments performed on the ASVspoof 2019 logical access database show that the fusion of GAT-based models with more conventional countermeasures delivers a 47% relative improvement in performance compared to the best performing single GAT system.
Data Quality as Predictor of Voice Anti-Spoofing Generalization
A novel interpretative framework for gauging the impact of data quality upon anti-spoofing performance is outlined and the impacts of long-term spectral information, speaker population, signal-to-noise ratio, and selected voice quality features are assessed.


A New Feature for Automatic Speaker Verification Anti-Spoofing: Constant Q Cepstral Coefficients
This paper proposes a new feature for spoofing detection based on the constant Q transform, a perceptually-inspired time-frequency analysis tool popular in the study of music and shows that, when coupled with a standard Gaussian mixture model-based classi fier, the proposed constant Q cepstral coefflcients (CQCCs) outperform all previously reported results by a signiffcant margin.
Constant Q cepstral coefficients: A spoofing countermeasure for automatic speaker verification
Ensemble Models for Spoofing Detection in Automatic Speaker Verification
This work investigates why some models on the PA dataset strongly outperform others and finds that spoofed recordings in the dataset tend to have longer silences at the end than genuine ones.
IIIT-H Spoofing Countermeasures for Automatic Speaker Verification Spoofing and Countermeasures Challenge 2019
The experimental results on ASVspoof 2019 dataset reveal that the proposed instantaneous features are efficient in detecting VC and SS based attacks and comparable with baseline systems.
ASVspoof 2019: Future Horizons in Spoofed and Fake Audio Detection
The 2019 database, protocols and challenge results are described, and major findings which demonstrate the real progress made in protecting against the threat of spoofing and fake audio are outlined.
Investigation of Sub-Band Discriminative Information Between Spoofed and Genuine Speech
This paper investigates discrimination between spoofed and genuine speech, as a function of frequency bands, across the speech bandwidth, to inform some proposed filter bank design approaches for discrimination of spoofed speech.
t-DCF: a Detection Cost Function for the Tandem Assessment of Spoofing Countermeasures and Automatic Speaker Verification
A migration from CM- to ASV-centric assessment with the aid of a new tandem detection cost function (t-DCF) metric is aimed at, which extends the conventional DCF used in ASV research to scenarios involving spoofing attacks.
The ASVspoof 2019 database
It was demonstrated that the spoofing data in the ASVspoof 2019 database have varied degrees of perceived quality and similarity to the target speakers, including spoofed data that cannot be differentiated from bona-fide utterances even by human subjects.
The SJTU Robust Anti-Spoofing System for the ASVspoof 2019 Challenge
The SJTU’s submitted antispoofing system shows consistent performance improvement over all types of spoofing attacks and Log-CQT features are developed in conjunction with multi-layer convolutional neural networks for robust performance across both subtasks.