Corpus ID: 1432307

Automatic phone set extension with confidence measure for spontaneous speech

  title={Automatic phone set extension with confidence measure for spontaneous speech},
  author={Y. Liu and Pascale Fung},
Extending the phone set is one common approach for dealing with phonetic confusions in spontaneous speech. We propose using likelihood ratio test as a confidence measure for automatic phone set extension to model phonetic confusions. We first extend the standard phone set using dynamic programming (DP) alignment to cover all possible phonetic confusions in training data. Likelihood ratio test is then used as a confidence measure to optimize the extended phonetic units to represent the acoustic… Expand
Acoustic modeling using an extended phone set considering cross-lingual pronunciation variations
This work uses a two-step agglomerative hierarchical clustering with delta Bayesian information criteria to automatically generate a merged extended phone set (MEPS), and chooses a parametric modeling technique, model complexity selection, to increase the final number of Gaussian components dependent on the available training data in a data unbalanced condition. Expand
Phonetic confusion analysis and robust phone set generation for Shanghai-accented Mandarin speech recognition
Experimental results show that compared to the canonical phone set, the generated one can reduce the substitution error greatly and achieve a 0.72% absolute Chinese character error rate (CER) reduction. Expand
Continuous phone recognition without target language training data
This paper addresses the key issue of designing attribute-to-phone mapping models by designing a phone-based background model for each of the speech attribute detector to improve attribute detection and shows that the proposed approach indeed decreases the false rejection rate of attribute detection, and improves the phone recognition accuracy. Expand
HMM-based phonemic distance in different speaking styles and its influence on substitutions in Mandarin speech recognition
Qualitative relationship between phone size and error rate in recognition is analytical researched, showing that for a particular phoneme, pronunciation variety is one of reasons for misidentification in recognizing process, which provides a novel mind to reduce substitution errors. Expand
State-dependent phonetic tied mixtures with pronunciation modeling for spontaneous speech recognition
A state-dependent phonetic tied-mixture model with variable codebook size that incorporates a state-level pronunciation model for better discrimination of phonetic and acoustic confusions, while reducing model complexity is proposed. Expand
Regional accents in Mandarin speech result mostly from partial phone changes due to the interlanguage system of non-native speakers. We propose partial change accent models based on accent-specificExpand
Partial change accent models for accented Mandarin speech recognition
  • L. Yi, Pascale Fung
  • Computer Science
  • 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721)
  • 2003
This work proposes partial change accent models based on accent-specific units with acoustic model reconstruction for accented Mandarin speech recognition using phonological rules of dialectical pronunciations together with likelihood ratio test to model actual accented variants rather than inherent phonetic confusions, recognizer errors or other data-specific variations. Expand
Optimizing the acoustic modeling from an unbalanced bi-lingual corpus
  • D. Lyu, Ren-Yuan Lyu
  • Computer Science
  • 2008 IEEE International Conference on Acoustics, Speech and Signal Processing
  • 2008
This paper presents a data-driven approach for not only acquiring a proper phoneme set but optimizing the acoustic modeling in this situation, using an agglomerative hierarchical clustering with delta Bayesian information criteria. Expand
Acoustic Model Optimization for Multilingual Speech Recognition
A three-step data-driven phone clustering method to train a multilingual acoustic model using an unbalanced trilingual corpus using a parametric modeling technique- model complexity selection to adjust the number of Gaussian components in a Gaussian mixture for optimizing the acoustic model between the new phoneme set and the available training data. Expand
Review of spoken dialogue systems
Spoken dialogue systems are computer programs developed to interact with users employing speech in order to provide them with specific automated services. The interaction is carried out by means ofExpand


Generation of robust phonetic set and decision tree for Mandarin using chi-square testing
A statistical method based on chi-square testing is used to investigate the phonetic unit characteristics that are confusing and develop a more reliable phonetic set, named modified SAMPA-C, and results show that an encouraging improvement in recognition performance can be obtained. Expand
Dynamic pronunciation models for automatic speech recognition
This dissertation examines how pronunciations vary in this speaking style, and how speaking rate and word predictability can be used to predict when greater pronunciation variation can be expected, and suggests that for spontaneous speech, it may be appropriate to build models for syllables and words that can dynamically change the pronunciation used in the speech recognizer based on the extended context. Expand
CASS: a phonetically transcribed corpus of mandarin spontaneous speech
A collection of Chinese spoken language has been collected and phonetically annotated to capture spontaneous speech and language effects and will be used at the 2000 Johns Hopkins University Language Engineering Workshop by the project on Pronunciation Modeling of Mandarin Casual Speech. Expand
Modeling partial pronunciation variations for spontaneous Mandarin speech recognition
It is shown that partial changes are a lot less clear-cut than previously assumed and cannot be modeled by mere representation by alternate phones or a concatenation of phone units and is proposed a partial change phone model (PCPM) to differentiate pronunciation variations. Expand
Automatic generation of pronunciation lexicons for Mandarin spontaneous speech
  • W. Byrne, V. Venkataramani, +5 authors Umar Ruhi
  • Computer Science
  • 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221)
  • 2001
Pronunciation modeling for large vocabulary speech recognition attempts to improve recognition accuracy by identifying and modeling pronunciations that are not in the ASR systems pronunciationExpand
Joint acoustic unit design and lexicon generation
  • Proc. Workshop on modeling pronunciation variation for ASR
  • 1998