Modeling partial pronunciation variations for spontaneous Mandarin speech recognition

@article{Liu2002ModelingPP,
  title={Modeling partial pronunciation variations for spontaneous Mandarin speech recognition},
  author={Y. Liu and Pascale Fung},
  journal={Comput. Speech Lang.},
  year={2002},
  volume={17},
  pages={357-379}
}
Abstract The high error rate in spontaneous speech recognition is due in part to the poor modeling of pronunciation variations. An analysis of acoustic data reveals that pronunciation variations include both complete changes and partial changes. Complete changes are the replacement of a canonical phoneme by another alternative phone, such as ‘b’ being pronounced as ‘p’. Partial changes are the variations within the phoneme, such as nasalization, centralization, voiceless, voiced, etc. Most… Expand
Acoustic Pronunciation Variations Modeling for Standard Malay Speech Recognition
TLDR
Experimental results show that the use of a pronunciation variation dictionary and the method of dynamic search space expansion can improve speech recognition performance substantially. Expand
Modeling Cantonese Pronunciation Variations for Large-Vocabulary Continuous Speech Recognition
TLDR
Experimental results show that the use of a pronunciation variation dictionary and the method of dynamic search space expansion can improve speech recognition performance substantially. Expand
Modeling Sound Changes in Mandarin Spontaneous Speech Using Deleted In- terpolation of Mixture Component Weights *
文 摘:The high error rate of recognition accuracy in spontaneous speech is due in part to the poor modeling of pronunciations variations. An analysis of the acoustic data reveals that the variationsExpand
State-dependent phonetic tied mixtures with pronunciation modeling for spontaneous speech recognition
TLDR
A state-dependent phonetic tied-mixture model with variable codebook size that incorporates a state-level pronunciation model for better discrimination of phonetic and acoustic confusions, while reducing model complexity is proposed. Expand
PARTIAL CHANGE ACCENT MODELS SPEECH RECOG
Regional accents in Mandarin speech result mostly from partial phone changes due to the interlanguage system of non-native speakers. We propose partial change accent models based on accent-specificExpand
Partial change accent models for accented Mandarin speech recognition
  • L. Yi, Pascale Fung
  • Computer Science
  • 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721)
  • 2003
TLDR
This work proposes partial change accent models based on accent-specific units with acoustic model reconstruction for accented Mandarin speech recognition using phonological rules of dialectical pronunciations together with likelihood ratio test to model actual accented variants rather than inherent phonetic confusions, recognizer errors or other data-specific variations. Expand
HMM-based phonemic distance in different speaking styles and its influence on substitutions in Mandarin speech recognition
TLDR
Qualitative relationship between phone size and error rate in recognition is analytical researched, showing that for a particular phoneme, pronunciation variety is one of reasons for misidentification in recognizing process, which provides a novel mind to reduce substitution errors. Expand
Pronunciation Space Models for Pronunciation Evaluation
TLDR
Posterior probability is mostly used for pronunciation evaluation in the experiments on a Chinese database spoken by Koreans with the correlation's improvement from 0.390 to 0.415 comparing to the traditional method based on phone based acoustic models. Expand
Within-word pronunciation variation modeling for Arabic ASRs: a direct data-driven approach
TLDR
A direct data-driven approach to model within-word pronunciation variations, in which the pronunciation variants are distilled from the training speech corpus, shows that while the expanded dictionary alone did not add appreciable improvements, the word error rate is significantly reduced when the variants are represented within the language model. Expand
Automatic Speech Recognition for Non- Native Speakers
TLDR
A hybrid approach of acoustic interpolation and merging has been proposed for adapting the target acoustic model and a speaker clustering approach called “latent pronunciation analysis” for clustering non-native speakers based on pronunciation habits is proposed. Expand
...
1
2
3
...

References

SHOWING 1-10 OF 31 REFERENCES
Pronunciation Modeling for Spontaneous Mandarin Speech Recognition
TLDR
It is shown that partial changes are much less clear-cut than previously assumed and cannot be modelled by mere representation by alternate phone units and can be applied to any automatic speech recognition system based on subword units. Expand
Pronunciation modeling by sharing gaussian densities across phonetic models
TLDR
The incorporation of pronunciation models into acoustic model training in addition to recognition is described, showing a 1.7 % improvement in recognition accuracy on the Switchboard corpus is presented. Expand
Automatic modeling of pronunciation variations
  • E. Eide
  • Computer Science
  • EUROSPEECH
  • 1999
TLDR
An automatic method for discovering an appropriate model for each contextdependent phoneme which allows for such phenomena as reduced pronunciations and substituted phonemes where warranted by observation on training data is reported on. Expand
Pronunciation modeling for conversational speech recognition
TLDR
This dissertation provides a fundamental and quantitative insight into pronunciation variability in spontaneous speech and demonstrates techniques for accommodating this variability within the framework of traditional automatic speech recognition systems that assume temporally non-overlapping phonetic segments. Expand
Pronunciation ambiguity vs. pronunciation variability in speech recognition
  • M. Saraçlar, S. Khudanpur
  • Computer Science
  • 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100)
  • 2000
TLDR
Analysis of manual phonetic transcription of conversational speech reveals a large number of cases of genuine ambiguity: instances where human labelers disagree on the identity of the surface form, and two methods for accommodating pronunciation ambiguity are developed. Expand
Modeling and efficient decoding of large vocabulary conversational speech
TLDR
A night state machine single-pre x-tree, one-pass, time-synchronous decoder is presented that decodes highly spontaneous speech within this new representational framework. Expand
Dynamic pronunciation models for automatic speech recognition
TLDR
This dissertation examines how pronunciations vary in this speaking style, and how speaking rate and word predictability can be used to predict when greater pronunciation variation can be expected, and suggests that for spontaneous speech, it may be appropriate to build models for syllables and words that can dynamically change the pronunciation used in the speech recognizer based on the extended context. Expand
Pronunciation modelling using a hand-labelled corpus for conversational speech recognition
  • W. Byrne, M. Finke, +6 authors G. Zavaliagkos
  • Computer Science
  • Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181)
  • 1998
TLDR
It is demonstrated that the improvement in recognition performance from pronunciation modelling persists as the system is enhanced with better acoustic and language models. Expand
Dictionary learning for spontaneous speech recognition
  • T. Sloboda, A. Waibel
  • Computer Science
  • Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96
  • 1996
TLDR
This work proposed a data driven approach to add new pronunciations to a given phonetic dictionary in a way that they model the given occurrences of words in the database and shows how this algorithm can be extended to produce alternative pronunciation for word tuples and frequently misrecognized words. Expand
Modeling pronunciation variation for ASR: A survey of the literature
TLDR
This contribution provides an overview of the publications on pronunciation variation modeling in automatic speech recognition, paying particular attention to the papers in this special issue and the papers presented at 'the Rolduc workshop'. Expand
...
1
2
3
4
...