An explainability study of the constant Q cepstral coefficient spoofing countermeasure for automatic speaker verification

@inproceedings{Tak2020AnES,
  title={An explainability study of the constant Q cepstral coefficient spoofing countermeasure for automatic speaker verification},
  author={Hemlata Tak and Jose Patino and Andreas Nautsch and Nicholas W. D. Evans and Massimiliano Todisco},
  booktitle={Odyssey},
  year={2020}
}
Anti-spoofing for automatic speaker verification is now a well established area of research, with three competitive challenges having been held in the last 6 years. A great deal of research effort over this time has been invested into the development of front-end representations tailored to the spoofing detection task. One such approach known as constant Q cepstral coefficients (CQCCs) have been shown to be especially effective in detecting attacks implemented with a unit selection based speech… 

Figures and Tables from this paper

Representation Selective Self-distillation and wav2vec 2.0 Feature Exploration for Spoof-aware Speaker Verification
TLDR
This study examines which feature space can effectively represent synthetic artifacts using wav2vec 2.0, and study which architecture can effectively utilize the space, and proposes a simple yet effective spoofing aware speaker verification (SASV) methodology which takes advantage of the disentangled representations from the countermeasure system.
A multi-branch ResNet with discriminative features for detection of replay speech signals
TLDR
This work proposes a CQT-based modified group delay feature (CQTMGD) which can capture the phase information ofCQT, and a multi-branch residual convolution network, ResNeWt, is proposed to distinguish replay attacks from bonafide attempts.
End-to-End Spectro-Temporal Graph Attention Networks for Speaker Verification Anti-Spoofing and Speech Deepfake Detection
TLDR
It is shown that better performance can be achieved when the fusion is performed within the model itself and when the representation is learned automatically from raw waveform inputs.
A Study On Data Augmentation In Voice Anti-Spoofing
Spoofing Attack Detection using the Non-linear Fusion of Sub-band Classifiers
TLDR
This work shows that a bank of very simple classifiers, each with a front-end tuned to the detection of different spoofing attacks and combined at the score level through non-linear fusion, can deliver superior performance than more sophisticated ensemble solutions that rely upon complex neural network architectures.
A Comparative Study of Fusion Methods for SASV Challenge 2022
TLDR
This paper describes the research of other fusion methods, including boosting over embeddings, which has not been used in anti-spoofing studies before, and a fusion overembeddings or scores obtained from ASV and CM models.
Texture-based Presentation Attack Detection for Automatic Speaker Verification
TLDR
This paper reports the exploration of texture descriptors applied to the analysis of speech spectrogram images and proposes a common fisher vector feature space based on a generative model for PAD solutions.
Known-unknown Data Augmentation Strategies for Detection of Logical Access, Physical Access and Speech Deepfake Attacks: ASVspoof 2021
  • Rohan Kumar Das
  • Computer Science
    2021 Edition of the Automatic Speaker Verification and Spoofing Countermeasures Challenge
  • 2021
TLDR
This work considers a few data augmentation methods to have a robust spoofing countermeasure based on the known information from the challenge evaluation protocol and with some unknown approaches that can be useful for each of the three tracks.
AASIST: Audio Anti-Spoofing using Integrated Spectro-Temporal Graph Attention Networks
TLDR
This work proposes a novel heterogeneous stacking graph attention layer which models artefacts spanning heterogeneous temporal and spectral domains with a heterogeneous attention mechanism and a stack node and proposes an approach that outperforms the current state-of-the-art by 20% relative.
Explaining Deep Learning Models for Spoofing and Deepfake Detection with Shapley Additive Explanations
TLDR
This paper describes the use of SHapley Additive exPlanations (SHAP) to gain new insights in spoofing detection, and demonstrates use of the tool in revealing unexpected classifler behaviour, the artefacts that contribute most to classi fier outputs and differences in the behaviour of competing spoo⬁n detection models.
...
1
2
...

References

SHOWING 1-10 OF 36 REFERENCES
A New Feature for Automatic Speaker Verification Anti-Spoofing: Constant Q Cepstral Coefficients
TLDR
This paper proposes a new feature for spoofing detection based on the constant Q transform, a perceptually-inspired time-frequency analysis tool popular in the study of music and shows that, when coupled with a standard Gaussian mixture model-based classi fier, the proposed constant Q cepstral coefflcients (CQCCs) outperform all previously reported results by a signiffcant margin.
Constant Q cepstral coefficients: A spoofing countermeasure for automatic speaker verification
Ensemble Models for Spoofing Detection in Automatic Speaker Verification
TLDR
This work investigates why some models on the PA dataset strongly outperform others and finds that spoofed recordings in the dataset tend to have longer silences at the end than genuine ones.
IIIT-H Spoofing Countermeasures for Automatic Speaker Verification Spoofing and Countermeasures Challenge 2019
TLDR
The experimental results on ASVspoof 2019 dataset reveal that the proposed instantaneous features are efficient in detecting VC and SS based attacks and comparable with baseline systems.
ASVspoof 2019: Future Horizons in Spoofed and Fake Audio Detection
TLDR
The 2019 database, protocols and challenge results are described, and major findings which demonstrate the real progress made in protecting against the threat of spoofing and fake audio are outlined.
Investigation of Sub-Band Discriminative Information Between Spoofed and Genuine Speech
TLDR
This paper investigates discrimination between spoofed and genuine speech, as a function of frequency bands, across the speech bandwidth, to inform some proposed filter bank design approaches for discrimination of spoofed speech.
t-DCF: a Detection Cost Function for the Tandem Assessment of Spoofing Countermeasures and Automatic Speaker Verification
TLDR
A migration from CM- to ASV-centric assessment with the aid of a new tandem detection cost function (t-DCF) metric is aimed at, which extends the conventional DCF used in ASV research to scenarios involving spoofing attacks.
The ASVspoof 2019 database
TLDR
It was demonstrated that the spoofing data in the ASVspoof 2019 database have varied degrees of perceived quality and similarity to the target speakers, including spoofed data that cannot be differentiated from bona-fide utterances even by human subjects.
The SJTU Robust Anti-Spoofing System for the ASVspoof 2019 Challenge
TLDR
The SJTU’s submitted antispoofing system shows consistent performance improvement over all types of spoofing attacks and Log-CQT features are developed in conjunction with multi-layer convolutional neural networks for robust performance across both subtasks.
Utterance Verification for Text-Dependent Speaker Recognition: A Comparative Assessment Using the RedDots Corpus
TLDR
Different strategies for simultaneous ASV and UV in the context of short-duration, text-dependent speaker verification are reported, showing that the combination of utterance verification with automatic speaker verification is (almost) universally beneficial with significant performance improvements being observed.
...
1
2
3
4
...