Investigations into prosodic syllable contour features for speaker recognition

  title={Investigations into prosodic syllable contour features for speaker recognition},
  author={Marcel Kockmann and Luk{\'a}{\vs} Burget and Jan Honza {\vC}ernock{\'y}},
  journal={2010 IEEE International Conference on Acoustics, Speech and Signal Processing},
We investigate various ways of generating prosodic syllable contour features that have recently been applied to enhance systems for speaker recognition. We compare different approaches for segmentation of speech into syllable-like units, techniques for contour modeling and the extraction of pitch and energy, taking into account the computational complexity and gender dependence. We show that the performance is especially affected by the segmentation and the quality of the pitch tracking… 

Figures and Tables from this paper

Prosodic features and formant modeling for an ivector-based language recognition system

Prosodic and formant information are combined to build a generative language identification system based on Gaussian models fed with iVectors based on shifted delta cepstral coefficients (SDC), showing the complementarity of both approaches.

Extraction and Representation of Prosody for Speaker, Language, Emotion, and Speech Recognition

  • L. Mary
  • Computer Science
    SpringerBriefs in Speech Technology
  • 2018
In this chapter, different techniques suggested for automatic extraction of prosodic features are described and are broadly classified as automatic speech recognizer (ASR)-free and ASR-based approaches.

Recent progress in prosodic speaker verification

It is shown that performance can be significantly improved with the use of probabilistic linear discriminant analysis (PLDA) for session variability compensation, and this system does not require score normalization.

Extraction and Representation of Prosody for Speaker, Speech and Language Recognition

  • L. Mary
  • Psychology
    Springer Briefs in Electrical and Computer Engineering
  • 2012
This book deals with prosody from speech processing point of view with topics including: The significance of prosody for speech processing applications, different methods for extraction and representation of Prosodic Features for Speech Processing Applications, and more.

MFCC and Prosodic Feature Extraction Techniques: A Comparative Study

This paper explores the usefulness of prosodic features for syllable classification and MFCC for feature extraction of a speech signal followed by comparison between them and the difference between cepstral and non-cepstral feature extraction techniques.

iVector-based prosodic system for language identification

An automatic language recognition system that extracts prosody information from speech and makes decisions about the language with a generative classifier based on iVectors is built and the fusion of the new approach with an iVector-based acoustic system is found to bring further improvements over the latter.

iVector Fusion of Prosodic and Cepstral Features for Speaker Verification

In this paper we apply the promising iVector extraction technique followed by PLDA modeling to simple prosodic contour features. With this procedure we achieve results comparable to a system that

THE SRI NIST 2008 speaker recognition evaluation system

The importance of language and nativeness conditioning, as well as the role of ASR for speaker verification, are shown and the performance of various subsystem combinations in different data conditions are analyzed.

Prosodic Analysis of Non-Native South Indian English Speech

This study finds that dynamic variation of pitch is the least for English speech by native Kannada language speakers, and the increase in standard deviation of pitch contour for non-native Englishspeech by Kannataka speakers is much less.



Contour modeling of prosodic and acoustic features for speaker recognition

This paper uses acoustic and prosodic features jointly in a long-temporal lexical context for automatic speaker recognition from speech and presents results for the combination of different features on a syllable-level as well as for channel compensation.

THE SRI NIST 2008 speaker recognition evaluation system

The importance of language and nativeness conditioning, as well as the role of ASR for speaker verification, are shown and the performance of various subsystem combinations in different data conditions are analyzed.

Analysis of Feature Extraction and Channel Compensation in a GMM Speaker Recognition System

This paper deals with eigenchannel adaptation in more detail and includes its theoretical background and implementation issues, undermining a common myth that the more boxes in the scheme, the better the system.

Comparison of scoring methods used in speaker recognition with Joint Factor Analysis

It is shown, that approximations of the true log-likelihood ratio (LLR) may lead to significant speedup without any loss in performance.

Score Normalization for Text-Independent Speaker Verification Systems

The test normalization method is extended to use knowledge of the handset type, and the world, cohort, and zero normalization techniques are explained.

BUT system description: NIST SRE 2008

BUT submitted three systems to NIST SRE 2008 evaluations, only to the short2-short3 condition, and the first contrastive systems differs only in callibration.

The NIST speaker recognition evaluation program

The National Institute of Standards and Technology (NIST) has coordinated annual scientific evaluations of text-independent speaker recognition since 1996, focusing primarily on speaker detection in the context of conversational telephone speech.

Hierarchical Structures of Neural Networks for Phoneme Recognition

This paper deals with phoneme recognition based on neural networks (NN), and focuses on temporal patterns (TRAPs) and novel split temporal context (STC) phoneme recognizers and investigates into tandem NN architectures.