Optimization of data-driven filterbank for automatic speaker verification

@article{Sarangi2020OptimizationOD,
  title={Optimization of data-driven filterbank for automatic speaker verification},
  author={Susanta Kumar Sarangi and Md. Sahidullah and Goutam Saha},
  journal={ArXiv},
  year={2020},
  volume={abs/2007.10729}
}
Abstract Most of the speech processing applications use triangular filters spaced in mel-scale for feature extraction. In this paper, we propose a new data-driven filter design method which optimizes filter parameters from a given speech data. First, we introduce a frame-selection based approach for developing speech-signal-based frequency warping scale. Then, we propose a new method for computing the filter frequency responses by using principal component analysis (PCA). The main advantage of… Expand
5 Citations
A Hybrid Meta-Heuristic Feature Selection Method for Identification of Indian Spoken Languages From Audio Signals
TLDR
A new nature-inspired feature selection (FS) algorithm is developed by hybridizing Binary Bat Algorithm with Late Acceptance Hill-Climbing (LAHC) to select the optimal subset from the said feature vectors in order to reduce the model complexity and help it train faster. Expand
Time-Varying Spectral Kurtosis: Generalization of Spectral Kurtosis for Local Damage Detection in Rotating Machines under Time-Varying Operating Conditions
TLDR
A generalization of the selector approach using the example of spectral kurtosis is proposed, which assumes creating a time-varying selector that can be seen as a spatial filter in the time-frequency domain. Expand
Automatically Discovering Relevant Images From Web Pages
TLDR
15 new features are proposed for the first time in this study for discovering the relevant images by employing the features extracted from different web pages consisting of standard news, galleries, video pages, and link pages. Expand
Machine learning and orthodontics, current trends and the future opportunities: A scoping review.
TLDR
AI can help orthodontists save time and provide accuracy comparable to the trained dentists in diagnostic assessments and prognostic predictions, and based on current studies, the most promising application was cephalometry landmark detection, skeletal classification, and decision making on tooth extractions. Expand
GIS-based ensemble computational models for flood susceptibility prediction in the Quang Binh Province, Vietnam
Abstract Recently, floods are occurring more frequently every year around the world due to increased anthropogenic activities and climate change. There is a need to develop accurate models for floodExpand

References

SHOWING 1-10 OF 103 REFERENCES
A novel approach in feature level for robust text-independent speaker identification system
  • S. Sarangi, G. Saha
  • Computer Science
  • 2012 4th International Conference on Intelligent Human Computer Interaction (IHCI)
  • 2012
TLDR
Speech-signal-based frequency cepstral coefficients (SFCC) is introduced in speaker recognition domain and proposed to use combination of filter banks of both the MFCC and SFCC in text-independent speaker identification. Expand
Data-driven spectral basis functions for automatic speech recognition
TLDR
Stochastic methods for designing feature extraction methods which are trained to alleviate the unwanted variability present in speech signals are proposed and shown to provide significant advantages over the conventional methods both in terms of performance of ASR and in providing understanding about the nature of speech signal. Expand
Improved Closed Set Text-Independent Speaker Identification by Combining MFCC with Evidence from Flipped Filter Banks
A state of the art Speaker Identification (SI) system requires a robust feature extraction unit followed by a speaker modeling scheme for generalized representation of these features. Over the years,Expand
Optimization of temporal filters for constructing robust features in speech recognition
  • J. Hung, Lin-Shan Lee
  • Mathematics, Computer Science
  • IEEE Transactions on Audio, Speech, and Language Processing
  • 2006
TLDR
It was found that the new optimization criteria of principal component analysis (PCA) and the minimum classification error (MCE) for constructing the temporal filters lead to superior performance over the original MFCC features, just as LDA-derived filters can. Expand
Design, analysis and experimental evaluation of block based transformation in MFCC computation for speaker recognition
TLDR
A class of linear transformation techniques based on block wise transformation of MFLE which effectively decorrelate the filter bank log energies and also capture speech information in an efficient manner are studied. Expand
Mean Hilbert envelope coefficients (MHEC) for robust speaker and language identification
TLDR
Experimental results indicate that: (i) the MHEC feature is highly effective and performs favorably compared to other conventional and state-of-the-art front-ends, and (ii) the power-law non-linearity consistently yields the best performance across different conditions for both SID and LID tasks. Expand
Power-Normalized Cepstral Coefficients (PNCC) for Robust Speech Recognition
  • Chanwoo Kim, R. Stern
  • Computer Science
  • IEEE/ACM Transactions on Audio, Speech, and Language Processing
  • 2016
TLDR
Experimental results demonstrate that PNCC processing provides substantial improvements in recognition accuracy compared to MFCC and PLP processing for speech in the presence of various types of additive noise and in reverberant environments, with only slightly greater computational cost than conventional MFCC processing. Expand
Data Driven Design of Filter Bank for Speech Recognition
TLDR
This work presents a method where the filter bank, optimized for discriminability between phonemes, is derived directly from phonetically labeled speech data using Linear Discriminant Analysis, proving the fact that incorporation of psychoacoustic findings into feature extraction can lead to better recognition performance. Expand
A perceptually-motivated low-complexity instantaneous linear channel normalization technique applied to speaker verification
TLDR
Speaker verification results demonstrate that the proposed LNCC features are of low computational complexity and far more effectively compensate for spectral tilt than ordinary MFCC coefficients. Expand
Data-Driven Temporal Filters and Alternatives to GMM in Speaker Verification
TLDR
A novel method for designing filters that are capable of normalizing the variability introduced by different telephone handsets is introduced and the effectiveness of the proposed channel normalizing filter in improving speaker verification performance in mismatched conditions is demonstrated. Expand
...
1
2
3
4
5
...