Optimization of data-driven filterbank for automatic speaker verification

  title={Optimization of data-driven filterbank for automatic speaker verification},
  author={Susanta Kumar Sarangi and Md. Sahidullah and Goutam Saha},

Combination of Time-domain, Frequency-domain, and Cepstral-domain Acoustic Features for Speech Commands Classification

A novel improvement BSR feature called BSR-float16 is proposed to represent floating-point values more precisely to improve the final classification accuracy and the fusion results also showed better noise robustness.

A Hybrid Meta-Heuristic Feature Selection Method for Identification of Indian Spoken Languages From Audio Signals

A new nature-inspired feature selection (FS) algorithm is developed by hybridizing Binary Bat Algorithm with Late Acceptance Hill-Climbing (LAHC) to select the optimal subset from the said feature vectors in order to reduce the model complexity and help it train faster.

Time-Varying Spectral Kurtosis: Generalization of Spectral Kurtosis for Local Damage Detection in Rotating Machines under Time-Varying Operating Conditions

A generalization of the selector approach using the example of spectral kurtosis is proposed, which assumes creating a time-varying selector that can be seen as a spatial filter in the time-frequency domain.

Review of Feature Extraction on Video-Oculography (VOG) and Electro-Oculography (EOG) Signals

This paper systematically describes feature extraction that is suitable for use in VOG and EOG signal analysis and can be used as a reference for developing feature extraction algorithms for EOG and VOG applications.

Automatically Discovering Relevant Images From Web Pages

15 new features are proposed for the first time in this study for discovering the relevant images by employing the features extracted from different web pages consisting of standard news, galleries, video pages, and link pages.

A Study on Android Malware Detection using Selected Features

In the feature selection process, the detection performance improved according to the number of features, and the API showed relatively better detection performance than the permission, confirming that the appropriate combination of characteristics could improve the detectionPerformance.

Machine learning and orthodontics, current trends and the future opportunities: A scoping review.



A novel approach in feature level for robust text-independent speaker identification system

  • S. SarangiG. Saha
  • Physics, Computer Science
    2012 4th International Conference on Intelligent Human Computer Interaction (IHCI)
  • 2012
Speech-signal-based frequency cepstral coefficients (SFCC) is introduced in speaker recognition domain and proposed to use combination of filter banks of both the MFCC and SFCC in text-independent speaker identification.

Data-driven spectral basis functions for automatic speech recognition

Improved Closed Set Text-Independent Speaker Identification by Combining MFCC with Evidence from Flipped Filter Banks

This paper proposes a new set of features using a complementary filter bank structure which improves distinguishability of speaker specific cues present in the higher frequency zone when combined with MFCC via a parallel implementation of speaker models, and outperforms baseline MFCC significantly.

Optimization of temporal filters for constructing robust features in speech recognition

  • J. HungLin-Shan Lee
  • Computer Science, Engineering
    IEEE Transactions on Audio, Speech, and Language Processing
  • 2006
It was found that the new optimization criteria of principal component analysis (PCA) and the minimum classification error (MCE) for constructing the temporal filters lead to superior performance over the original MFCC features, just as LDA-derived filters can.

Mean Hilbert envelope coefficients (MHEC) for robust speaker and language identification

Power-Normalized Cepstral Coefficients (PNCC) for Robust Speech Recognition

  • Chanwoo KimR. Stern
  • Computer Science
    IEEE/ACM Transactions on Audio, Speech, and Language Processing
  • 2016
Experimental results demonstrate that PNCC processing provides substantial improvements in recognition accuracy compared to MFCC and PLP processing for speech in the presence of various types of additive noise and in reverberant environments, with only slightly greater computational cost than conventional MFCC processing.

Data Driven Design of Filter Bank for Speech Recognition

This work presents a method where the filter bank, optimized for discriminability between phonemes, is derived directly from phonetically labeled speech data using Linear Discriminant Analysis, proving the fact that incorporation of psychoacoustic findings into feature extraction can lead to better recognition performance.

Data-Driven Temporal Filters and Alternatives to GMM in Speaker Verification

A novel method for designing filters that are capable of normalizing the variability introduced by different telephone handsets is introduced and the effectiveness of the proposed channel normalizing filter in improving speaker verification performance in mismatched conditions is demonstrated.