• Publications
  • Influence
An overview of text-independent speaker recognition: From features to supervectors
TLDR
This paper starts with the fundamentals of automatic speaker recognition, concerning feature extraction and speaker modeling and elaborate advanced computational techniques to address robustness and session variability. Expand
Text-dependent speaker verification: Classifiers, databases and RSR2015
TLDR
The HiLAM system, based on a three layer acoustic architecture, and an i-vector/PLDA system, outperforms the state-of-the-art i- vector system in most of the scenarios and provides a reference evaluation scheme and a reference performance on RSR2015 database to the research community. Expand
A Joint Source-Channel Model for Machine Transliteration
TLDR
A new framework that allows direct orthographical mapping between two different languages, through a joint source-channel model, also called n-gram transliteration model (TM), which greatly reduces system development effort and provides a quantum leap in improvement in transliterations accuracy over that of other state-of-the-art machine learning algorithms. Expand
Spoken Language Recognition: From Fundamentals to Practice
TLDR
This paper attempts to provide an introductory tutorial on the fundamentals of the theory and the state-of-the-art solutions of spoken language recognition, from both phonological and computational aspects. Expand
Spoofing and countermeasures for speaker verification: A survey
TLDR
A survey of past work and priority research directions for the future is provided, showing that future research should address the lack of standard datasets and the over-fitting of existing countermeasures to specific, known spoofing attacks. Expand
A learning-based approach to direction of arrival estimation in noisy and reverberant environments
TLDR
A learning-based approach that can learn from a large amount of simulated noisy and reverberant microphone array inputs for robust DOA estimation and uses a multilayer perceptron neural network to learn the nonlinear mapping from such features to the DOA. Expand
Precise-Spike-Driven Synaptic Plasticity: Learning Hetero-Association of Spatiotemporal Spike Patterns
TLDR
Experimental results show that the PSD rule is capable of spatiotemporal pattern classification, and can even outperform a well studied benchmark algorithm with the proposed relative confidence criterion. Expand
Low-Variance Multitaper MFCC Features: A Case Study in Robust Speaker Verification
TLDR
This paper provides detailed statistical analysis of MFCC bias and variance using autoregressive process simulations on the TIMIT corpus and proposes the multitaper method for MFCC extraction with a practical focus. Expand
A Vector Space Modeling Approach to Spoken Language Identification
TLDR
The proposed VSM approach leads to a discriminative classifier backend, which is demonstrated to give superior performance over likelihood-based n-gram language modeling (LM) backend for long utterances. Expand
Detecting Converted Speech and Natural Speech for anti-Spoofing Attack in Speaker Recognition
TLDR
Experiments show that the performance of the features derived from phase spectrum outperform the melfrequency cepstral coefficients (MFCCs) tremendously: even without converted speech for training, the equal error rate (EER) is reduced from 20.20% of MFCCs to 2.35%. Expand
...
1
2
3
4
5
...