The SYSU system for the interspeech 2015 automatic speaker verification spoofing and countermeasures challenge

@article{Weng2015TheSS,
  title={The SYSU system for the interspeech 2015 automatic speaker verification spoofing and countermeasures challenge},
  author={Shi-Yan Weng and Shushan Chen and Lei Yu and Xuewei Wu and Weicheng Cai and Zhi Liu and Yiming Zhou and Ming Li},
  journal={2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)},
  year={2015},
  pages={152-155}
}
  • Shi-Yan Weng, Shushan Chen, +5 authors Ming Li
  • Published 2015
  • Computer Science
  • 2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)
Many existing speaker verification systems are reported to be vulnerable against different spoofing attacks, for example speech synthesis, voice conversion, play back, etc. In order to detect these spoofed speech signals as a countermeasure, we propose a score level fusion approach with several different i-vector subsystems. We show that the acoustic level Mel-frequency cepstral coefficients (MFCC) features, the phase level modified group delay cepstral coefficients (MGDCC) and the phonetic… Expand
Novel Nonlinear Prediction Based Features for Spoofed Speech Detection
TLDR
To evaluate the effectiveness of the proposed countermeasure, the corpora provided at the ASVspoof 2015 challenge are used and a Gaussian Mixture Model (GMM)-based classifier is used and the % Equal Error Rate (EER) is used as a performance measure. Expand
Synthetic speech detection using fundamental frequency variation and spectral features
TLDR
This paper proposed a new approach to detect synthetic speech using score-level fusion of front-end features namely, constant Q cepstral coefficients (CQCCs), all-pole group delay function (APGDF) and fundamental frequency variation (FFV), which outperforms all existing baseline features for both known and unknown attacks. Expand
Anti-spoofing Methods for Automatic Speaker Verification System
TLDR
An overview of different acoustic feature spaces and classifiers in automatic speaker verification systems to determine reliable and robust countermeasures against spoofing attacks and demonstrates that the linear SVM classifier outperforms the conventional GMM approach. Expand
Data selection for i-vector based automatic speaker verification anti-spoofing
  • C. Hanilçi
  • Computer Science, Mathematics
  • Digit. Signal Process.
  • 2018
TLDR
This study focuses on improving the spoofing detection performance of i-vector system using cosine and probabilistic linear discriminant analysis (PLDA) scoring and proposes a scheme that outperforms simple Gaussian mixture model (GMM) classifier and i- vector countermeasures. Expand
The GMM and I-Vector Systems Based on Spoofing Algorithms for Speaker Spoofing Detection
TLDR
Experiments on the ASVspoof 2019 challenge logical access scenario show that the GMM classifier with the Support Vector Machines (SVM) scoring method based on different spoofing algorithms obtains the best performance on the evaluation set with EER of 7.03%. Expand
Spoofing detection goes noisy: An analysis of synthetic speech detection in the presence of additive noise
TLDR
A significant gap is revealed between the performance of state-of-the-art spoofing detectors between clean and noisy conditions and a study with two score fusion strategies shows that combining different feature based systems improves recognition accuracy for known and unknown attacks in both clean and noise conditions. Expand
Cochlear Filter and Instantaneous Frequency Based Features for Spoofed Speech Detection
  • T. Patel, H. Patil
  • Computer Science
  • IEEE Journal of Selected Topics in Signal Processing
  • 2017
TLDR
It was observed that subband energy variations across CFCCIF when estimated by symmetric difference (CFCCIFS) gave better discriminative properties than CFCC IF, and VC speech is relatively difficult to detect than SS by the SSD system. Expand
Zero resource anti-spoofing detection for unit selection based synthetic speech using image spectrogram artifacts
TLDR
This paper proposes a detection algorithm to counter unit selection based synthesis speech that is free of training and exploits presence of artifacts in image spectrogram to perform detection. Expand
End-to-end spoofing detection with raw waveform CLDNNS
TLDR
A novel raw waveform based deep model for spoofing detection is presented, which jointly acts as a feature extractor and classifier, thus allowing it to directly classify speech signals. Expand
An investigation of spectral feature partitioning for replay attacks detection
TLDR
A statistical measure based on the Rayleigh Quotient is proposed in order to investigate a feature partition capable of discerning genuine and playback speech under unseen conditions and confirms the effectiveness of this approach. Expand
...
1
2
...

References

SHOWING 1-10 OF 32 REFERENCES
Joint Speaker Verification and Antispoofing in the $i$ -Vector Space
TLDR
Back-end generative models for more generalized countermeasures are explored and synthesis-channel subspace is model to perform speaker verification and antispoofing jointly in the i-vector space, which is a well-established technique for speaker modeling. Expand
Detecting Converted Speech and Natural Speech for anti-Spoofing Attack in Speaker Recognition
TLDR
Experiments show that the performance of the features derived from phase spectrum outperform the melfrequency cepstral coefficients (MFCCs) tremendously: even without converted speech for training, the equal error rate (EER) is reduced from 20.20% of MFCCs to 2.35%. Expand
Synthetic speech detection using temporal modulation feature
TLDR
From the synthetic speech detection results, the modulation features provide complementary information to magnitude/phase features, and the best detection performance is obtained by fusing phase modulation features and phase features, yielding an equal error rate. Expand
A novel scheme for speaker recognition using a phonetically-aware deep neural network
We propose a novel framework for speaker recognition in which extraction of sufficient statistics for the state-of-the-art i-vector model is driven by a deep neural network (DNN) trained forExpand
Speaker verification and spoken language identification using a generalized i-vector framework with phonetic tokenizations and tandem features
TLDR
The tokens for calculating the zero-order statistics is extended from the MFCC trained Gaussian Mixture Models (GMM) components to phonetic phonemes, 3-grams and tandem feature trained GMM components using phoneme posterior probabilities. Expand
Comparing prosodic models for speaker recognition
TLDR
These experiments show that simple prosodic systems with features extracted from fixed-size contour segments, without knowledge of phone/pseudo-syllable level information, still provide significant performance improvement when fused with a state-of-the-art cepstral-based system. Expand
Deep Neural Networks for extracting Baum-Welch statistics for Speaker Recognition
TLDR
Although the proposed i-vectors yield inferior performance compared to the standard ones, they are capable of attaining 16% relative improvement when fused with them, meaning that they carry useful complementary information about the speaker’s identity. Expand
Automatic recognition of speaker physical load using posterior probability based features from acoustic and phonetic tokens
  • Ming Li
  • Computer Science
  • INTERSPEECH
  • 2014
TLDR
The tokens for calculating the posterior probability or zero-order statistics are extended from the conventional MFCC trained Gaussian Mixture Models (GMM) components to parallel phonetic phonemes and tandem feature trained GMM components to extract the phoneme posterior probabilities. Expand
Front-End Factor Analysis for Speaker Verification
TLDR
An extension of the previous work which proposes a new speaker representation for speaker verification, a new low-dimensional speaker- and channel-dependent space is defined using a simple factor analysis, named the total variability space because it models both speaker and channel variabilities. Expand
Extended phone log-likelihood ratio features and acoustic-based i-vectors for language recognition
This paper presents new techniques with relevant improvements added to the primary system presented by our group to the Albayzin 2012 LRE competition, where the use of any additional corpora forExpand
...
1
2
3
4
...