Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se

  title={Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se},
  author={S. Davis and Paul Mermelstein},
Several parametric representations of the acoustic signal were compared with regard to word recognition performance in a syllable-oriented continuous speech recognition system. The vocabulary included many phonetically similar monosyllabic words, therefore the emphasis was on the ability to retain phonetically significant acoustic information in the face of syntactic and duration variations. For each parameter set (based on a mel-frequency cepstrum, a linear frequency cepstrum, a linear… 

Figures and Tables from this paper

Speech recognition of mandarin monosyllables

  • T. F. Li
  • Computer Science
    Pattern Recognit.
  • 2003

A comparison of feature representations for speaker-independent voiced-stop-consonant recognition

It is concluded that the feature representations produced by Seneff's (1988) auditory model particularly the mean-rate response representation, are good representations for voiced-stop consonant speech as well as vowel speech and the addition of dynamic feature information in the form of differenced cepstral coefficients to the conglomerate mel-cepstral representative vectors made a difference in the recognition rate.

Evaluation of mel-LPC cepstrum in a large vocabulary continuous speech recognition

  • H. MatsumotoMasanori Moroto
  • Computer Science
    2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221)
  • 2001
This paper compares the recognition performance of the mel-LPC cepstrum with those of both the standard LPC mel-cepstrum and the MFCC through the Japanese dictation system with 20,000 word vocabulary, and finds that this performance is slightly superior to that of MFCC.

A syllable, articulatory-feature, and stress-accent model of speech recognition

Analysis results provide evidence for an alternative approach of speech modeling, one in which the syllable assumes pre-eminent status and is melded to the lower as well as the higher tiers of linguistic representation through the incorporation of prosodic information such as stress accent.

Significance of group delay based acoustic features in the linguistic search space for robust speech recognition

In this paper we discuss the complementarity of the group delay features with respect to other conventional acoustic features and also propose the use of such diverse information in the linguistic

Recognition Of Phonemes In A-Cappella Recordings Using Temporal Patterns And Mel Frequency Cepstral Coefficients

Two alternative classification methods dealing with phonemes in singing, one uses Mel-Frequency Cepstral Coefficient features, while another uses Temporal Patterns, are combined to create a new type of classifier which produces a better performance than the two separate classifiers.

Acoustic-Phonetic Feature Based Dialect Identification in Hindi Speech

A method to identify Hindi dialects and examine the contribution of different acoustic-phonetic features for the purpose to measure the capability of Auto-associative neural networks for capturing non-linear relation specific to information from spectral features.

Modeling lexical tones for mandarin large vocabulary continuous speech recognition

This dissertation proposes several new strategies for tone modeling and explores their effectiveness in state-of-the-art HMM-based Mandarin large vocabulary speech recognition systems in two domains: conversational telephone speech and broadcast news.

A novel feature transformation for vocal tract length normalization in automatic speech recognition

This paper proposes a method to transform acoustic models that have been trained with a certain group of speakers for use on different speech in hidden Markov model based (HMM-based) automatic speech



Evaluation of acoustic parameters for monosyllabic word identification

Several recent investigations have hypothesized that syllable‐sized segments may be more appropriate units than phoneme‐sized segments for use in continuous speech recognition systems. The

Recognition of monosyllabic words in continuous sentences using composite word templates

A modified dynamic programming algorithm is presented that allows building up of reference information from a speaker's productions in the face of variations in acoustic forms induced by variation in the syntactic role of the word in the sentences.

Order dependence in templates for monosyllabic word identification

The ordering of words during template generation did not significantly affect word identification and the average correct identification in open tests for each speaker was 94.76% and 90.53%, with standard deviations of 0.53%.

Automatic segmentation of speech into syllabic units.

  • P. Mermelstein
  • Linguistics, Physics
    The Journal of the Acoustical Society of America
  • 1975
It is suggested that inclusion of alternative fluent‐form syllabifications for multisyllabic words and the use of phonological rules for predicting syllabic contractions can further improve agreement between predicted and experimental syllable counts.

Minimum prediction residual principle applied to speech recognition

A computer system is described in which isolated words, spoken by a designated talker, are recognized through calculation of a minimum prediction residual through optimally registering the reference LPC onto the input autocorrelation coefficients using the dynamic programming algorithm.

Speech recognition experiments with linear predication, bandpass filtering, and dynamic programming

Automatic speech recognition experiments are described in which several popular preprocessing and classification strategies are compared and it is shown that dynamic programming is of major importance for recognition of polysyllabic words.

Considerations in dynamic time warping algorithms for discrete word recognition

An algorithm in which an uncertainty exists in the registration both for initial and final frames was studied and another which constrains the dynamic path to follow the path which is locally minimum at each frame.

Syllable as a unit of speech recognition

Irregularities in phonetic manifestations of phonemes are discussed and it is argued that the syllable, phonologically redefined, will serve as the effective minimal unit in the time domain.

A phonetic-context controlled strategy for segmentation and phonetic labeling of speech

The extraction of acoustic cues pertinent to a phonetic feature can be tuned to classes of sounds separated on the basis of other cues, and this serves to increase the reliability of segment labeling.

On creating reference templates for speaker independent recognition of isolated words

A method of combining word patterns from a number of speakers is proposed in which a clustering type of analysis is used to determine which patterns are merged to create a word template.