• Corpus ID: 9144211

Voice Quality Dependent Speech Recognition

  title={Voice Quality Dependent Speech Recognition},
  author={Taejin Yoon and Xiaodan Zhuang and Jennifer S. Cole and Mark A. Hasegawa-Johnson},
Voice quality conveys both linguistic and paralinguistic information, and can be distinguished by acoustic source characteristics. We label objective voice quality categories based on the spectral and temporal structure of speech sounds, specifically the harmonic structure (H1-H2) and the mean autocorrelation ratio of each phone. Results from a classification experiment using a Support Vector Machine (SVM) classifier show that allophones that differ from each other regarding voice quality can… 

Tables from this paper

Automatic identification of modal, breathy and creaky voices

A way for the automatic identification of different voice qualities present in a speech signal which is very beneficiary for detecting any kind of speech by an efficient speech recognition system is presented.

Automatic Classification of Regular vs. Irregular Phonation Types

A classifier that extracts six acoustic cues from vowels and then labels them as regular or irregular by means of a support vector machine is proposed and integrated cues from earlier phonation type classifiers are integrated and improved their performance in five out of the six cases.

Hmm-based Classification of Glottalization Phenomena in German-accented English

The present paper investigates the automatic detection of word-initial glottalization phenomena (glottal stops and creaky voice) in German-accented English by means of HMMs. Glottalization of

Acoustic Word Disambiguation with Phonogical Features in Danish ASR

Danish stød can be predicted from speech and used to improve ASR, and acoustic features that are novel to the phonetic characterisation ofStød are discovered.

Low-resource spoken keyword search strategies in georgian inspired by distinctive feature theory

  • Nancy F. ChenB. P. Lim Haizhou Li
  • Computer Science
    2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)
  • 2017
Low-resource spoken keyword search strategies guided by distinctive feature theory in linguistics are presented to conduct data selection, feature selection, and transcription augmentation to improve KWS for extremely under-resourced conditions.

A Comparative Study of Gaussian Mixture Model and Radial Basis Function for Voice Recognition

A comparative study of the application of Gaussian Mixture Model (GMM) and Radial Basis Function (RBF) in biometric recognition of voice has been carried out and the results showed very close recognition accuracy between the GMM and the standard RBF model, but with GMM performing better than the standardRBF.

Analysis and Synthesis of Glottalization Phenomena in German-Accented English

The present paper investigates the analysis and synthesis of glottalization phenomena in German-accented English. Word-initial glottalization was manually annotated in a subset of a German-accented

Strategies for Vietnamese keyword search

To the best of the knowledge, the proposed transliteration framework is the first reported rule-based system for Vietnamese; it outperforms statistical-approach baselines up to 14.93-36.73% relative on foreign loan word search tasks.

A novel Gaussianized vector representation for natural scene categorization

This paper presents a novel Gaussianized vector representation for scene images by an unsupervised approach, and proves that these super-vectors observe the standard normal distribution.



The Importance of Prosodic Factors in Phoneme Modeling with Applications to Speech Recognition

It can be concluded that modeling all prosodic information directly in the vowel model leads to improvement in the model.

Acoustic correlates of non‐modal phonation in telephone speech

Non‐modal phonation conveys both linguistic and paralinguistic information, and is distinguished by acoustic source and filter features. Detecting non‐modal phonation in speech requires reliable F0

The voice source in connected speech

Perceptual linear predictive (PLP) analysis of speech.

  • H. Hermansky
  • Physics
    The Journal of the Acoustical Society of America
  • 1990
A new technique for the analysis of speech, the perceptual linear predictive (PLP) technique, which uses three concepts from the psychophysics of hearing to derive an estimate of the auditory spectrum, and yields a low-dimensional representation of speech.

The phonetic description of voice quality

The importance of an individual's voice in everyday social interaction can scarcely be overestimated. It is an essential element in the listener's analysis of the speaker's physical, psychological

SWITCHBOARD: telephone speech corpus for research and development

  • J. GodfreyE. HollimanJ. McDaniel
  • Physics, Linguistics
    [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing
  • 1992
SWITCHBOARD is a large multispeaker corpus of conversational speech and text which should be of interest to researchers in speaker authentication and large vocabulary speech recognition. About 2500

Glottal characteristics of male speakers: acoustic correlates and comparison with female data.

Observations of the speech waveforms and spectra suggest the presence of a secondglottal excitation within a glottal period for some of the male speakers, consistent with fiberscopic studies which have shown that males tend to have a more complete glotto closure, leading to less energy loss at the glottis and less spectral tilt.

Probabilistic classification of HMM states for large vocabulary continuous speech recognition

  • Xiaoqiang LuoF. Jelinek
  • Computer Science
    1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258)
  • 1999
A probabilistic classification of HMM states (PCHMM) is proposed and a distribution from a HMM state to classes is introduced, which makes the acoustic model more robust against the possible mismatch or variation between training and test data.

Phonation types: a cross-linguistic overview

Differences in phonation type signal important linguistic information in many languages, including contrasts between otherwise identical lexical items and boundaries of prosodic constituents, according to a recurring set of articulatory, acoustic, and timing properties.


The ISIP Automatic Speech Recognition system (ISIP-ASR) used for the Hub-5 2000 Engl evaluations is a publ ic domain cross-word context-dependent HMM based system that has all the functionalitynormallyexpected in an LVCSR system.