Corpus ID: 11328665

Combining amplitude and phase-based features for speaker verification with short duration utterances

  title={Combining amplitude and phase-based features for speaker verification with short duration utterances},
  author={Md. Jahangir Alam and Patrick Kenny and Themos Stafylakis},
Due to the increasing use of fusion in speaker recognition systems, one trend of current research activity focuses on new features that capture complementary information to the MFCC (Mel-frequency cepstral coefficients) for improving speaker recognition performance. [...] Key Method To compute phasebased features we choose modified group delayand all-pole group delay-, linear prediction residual phase-based features.Expand
Multitaper MFCC and normalized multitaper phase-based features for speaker verification
This work proposes a phase information extraction method that normalizes the change variation in multitaper phase according to the frame position of the input speech to reduce the uncertainty of multitapers phase information in both the state-of-the-art Gaussian mixture model-universal background model (GMM-UBM) baseline and the i-vector speaker verification system. Expand
Tandem Features for Text-Dependent Speaker Verification on the RedDots Corpus
Both the tandem feature-based system and fused system provided significant improvements over the baseline GMM/UBM system in terms of equal error rates (EER) and detection cost functions (DCFs) as defined in the 2008 and 2010 NIST speaker recognition evaluations. Expand
First Experiments on Speaker Identification Combining a New Shift-invariant Phase-related Feature (NRD), MFCCs and F0 Information
A number of speaker identification experiments that assume a phonetic-oriented segmentation scheme exists such as to motivate the extraction of psychoacoustically-motivated phase and pitch related features are reported on. Expand
Replay attack detection with auditory filter-based relative phase features
This paper improves the discriminating ability of the RP feature by proposing two new auditory filter-based RP features for replay attack detection and a combination of the scores of these proposed RP features and a standard magnitude-based feature, that is, the constant Q transform cepstral coefficient (CQCC), is also applied to further improve the reliable detection decision. Expand
Exploitation of Phase-Based Features for Whispered Speech Emotion Recognition
This paper proposes a new speech emotion recognition framework, employing outer product in combination with power and L2 normalization, and shows that, combining phase information with magnitude information could significantly improve performance over the common systems solely adopting magnitude information. Expand
Analysis of Complementary Information Sources in the Speaker Embeddings Framework
It is found that first and second embeddings layers are complementary in nature, and relative improvements in equal error rate of 17% on NIST SRE 2016 and 10% on SITW over the baseline system are demonstrated. Expand
Feature-switching: Dynamic feature selection for an i-vector based speaker verification system
This paper proposes an alternative technique which achieves a similar effect but utilizes a more effective feature selection technique, known as feature-switching, which achieves improved performance compared to conventional as well as fusion-based systems. Expand
Articulatory movement features for short-duration text-dependent speaker verification
Experimental results show that the AMFs can bring significant performance gains over the traditional MFCC features for short-duration text-dependent speaker verification task. Expand
Speaker verification with short utterances: a review of challenges, trends and opportunities
The authors present an extensive survey of SV with short utterances considering the studies from recent past and include latest research offering various solutions and analyses to address the limited data issue within the scope of SV. Expand
Spoofing Detection on the ASVspoof 2015 Challenge Corpus Employing Deep Neural Networks
This paper describes the application of deep neural networks (DNN), trained to discriminate between human and spoofed speech signals, to improve the performance of spoofing detection. In this work weExpand


Combining evidence from residual phase and MFCC features for speaker recognition
An EER of 10.5% is obtained, indicating that speaker-specific excitation information is present in the residual phase, which is useful since it is complementary to that of MFCCs. Expand
The Delta-Phase Spectrum With Application to Voice Activity Detection and Speaker Recognition
Experiments show that mel-frequency cepstral coefficients features derived from the delta-phase spectrum can produce broadly similar performance to equivalent magnitude domain features for both voice activity detection and speaker recognition tasks. Expand
Multitaper MFCC and PLP features for speaker verification using i-vectors
Speaker verification results on the telephone and microphone speech of the latest NIST 2010 SRE corpus indicate that the multi-taper methods outperform the conventional periodogram technique. Expand
Using group delay functions from all-pole models for speaker recognition
It is shown that group delay features derived via parametric all-pole models improve recognition accuracy, especially under high vocal effort, and provide comparable or improved accuracy over conventional magnitude-based MFCC features. Expand
Synthetic speech detection using temporal modulation feature
From the synthetic speech detection results, the modulation features provide complementary information to magnitude/phase features, and the best detection performance is obtained by fusing phase modulation features and phase features, yielding an equal error rate. Expand
Product of power spectrum and group delay function for speech recognition
  • Donglai Zhu, K. Paliwal
  • Mathematics, Computer Science
  • 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing
  • 2004
Results show that the cepstral features derived from the power spectrum perform better than that from the MGDF, and the product spectrum based features provide the best performance. Expand
Front-end Diversity in Fused Speaker Recognition Systems
Some possible variations to the extraction of MFCCs that produce diversity with respect to fused subsystems based on different MFCC-variant features are investigated, including the use of different filter shapes. Expand
Short-time phase spectrum in speech processing: A review and some experimental results
It is suggested that a short-time phase spectrum feature set may ultimately be derived from a concatenation of information from both the GDF and IFD representations, and that these features perform worse than the standard MFCC features. Expand
The modified group delay function and its application to phoneme recognition
  • H. Murthy, V. R. Gadde
  • Computer Science
  • 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03).
  • 2003
A new spectral representation of speech signals through group delay functions through cepstral coefficients is explored, which reduces the effects of zeroes close to the unit circle in the z-domain and these clutter the spectra. Expand
A new phase-based feature representation for robust speech recognition
The aim of this paper is to introduce a novel phase-based feature representation for robust speech recognition. This method consists of four main parts: autoregressive (AR) model extraction, groupExpand