Pronunciation modeling by sharing gaussian densities across phonetic models

@article{Saralar1999PronunciationMB,
  title={Pronunciation modeling by sharing gaussian densities across phonetic models},
  author={M. Saraçlar and H. Nock and S. Khudanpur},
  journal={Comput. Speech Lang.},
  year={1999},
  volume={14},
  pages={137-160}
}
Conversational speech exhibits considerable pronunciation variability, which has been shown to have a detrimental effect on the accuracy of automatic speech recognition. There have been many attempts to model pronunciation variation, including the use of decision trees to generate alternate word pronunciations from phonemic baseforms. Use of pronunciation models during recognition is known to improve accuracy. This paper describes the incorporation of pronunciation models into acoustic model… Expand
Pronunciation ambiguity vs. pronunciation variability in speech recognition
  • M. Saraçlar, S. Khudanpur
  • Computer Science
  • 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100)
  • 2000
TLDR
Analysis of manual phonetic transcription of conversational speech reveals a large number of cases of genuine ambiguity: instances where human labelers disagree on the identity of the surface form, and two methods for accommodating pronunciation ambiguity are developed. Expand
Pronunciation change in conversational speech and its implications for automatic speech recognition
TLDR
It is demonstrated here that in most cases, the change is only partial; a phone is not completely deleted or substituted by another phone but is modified only partially, and two methods are suggested for accommodating such partial pronunciation change in the automatic recognition of spontaneous speech. Expand
Pronunciation variation speech recognition without dictionary modification on sparse database
TLDR
Sharing Gaussian densities across phonetic models and decision tree for pronunciation variation is proved to be efficient for a pronunciation variation system without dictionary modification. Expand
Modeling Cantonese Pronunciation Variations for Large-Vocabulary Continuous Speech Recognition
TLDR
Experimental results show that the use of a pronunciation variation dictionary and the method of dynamic search space expansion can improve speech recognition performance substantially. Expand
Modeling partial pronunciation variations for spontaneous Mandarin speech recognition
TLDR
It is shown that partial changes are a lot less clear-cut than previously assumed and cannot be modeled by mere representation by alternate phones or a concatenation of phone units and is proposed a partial change phone model (PCPM) to differentiate pronunciation variations. Expand
Implicit modelling of pronunciation variation in automatic speech recognition
TLDR
A method for the stepwise reduction of the number of pronunciation variants per word to one is described and it is shown that the use of single pronunciation dictionaries provides similar or better word error rate performance, achieved both on Wall Street Journal and Switchboard data. Expand
Symbolic phonetic features for modeling of pronunciation variation
TLDR
A phonetic-feature-based prediction model is presented where phones are represented by a vector of symbolic features that can be on, off, unspecified or unused, and experiments show that feature-based models benefit from prosody cues, but not text, and that phone- based models do not benefit from any of the high-level cues explored here. Expand
Dynamic pronunciation models for automatic speech recognition
TLDR
This dissertation examines how pronunciations vary in this speaking style, and how speaking rate and word predictability can be used to predict when greater pronunciation variation can be expected, and suggests that for spontaneous speech, it may be appropriate to build models for syllables and words that can dynamically change the pronunciation used in the speech recognizer based on the extended context. Expand
Multilingual recognition of non-native speech using acoustic model transformation and pronunciation modeling
TLDR
A novel approach to introduce confusion rules in the recognition system which are automatically learned through pronunciation modelling, which gives better recognition results than the classical acoustic adaptation of HMM when the foreign origin of the speaker is known. Expand
Acoustic Pronunciation Variations Modeling for Standard Malay Speech Recognition
TLDR
Experimental results show that the use of a pronunciation variation dictionary and the method of dynamic search space expansion can improve speech recognition performance substantially. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 45 REFERENCES
Automatic modeling of pronunciation variations
  • E. Eide
  • Computer Science
  • EUROSPEECH
  • 1999
TLDR
An automatic method for discovering an appropriate model for each contextdependent phoneme which allows for such phenomena as reduced pronunciations and substituted phonemes where warranted by observation on training data is reported on. Expand
Modeling and efficient decoding of large vocabulary conversational speech
TLDR
A night state machine single-pre x-tree, one-pass, time-synchronous decoder is presented that decodes highly spontaneous speech within this new representational framework. Expand
Pronunciation modelling using a hand-labelled corpus for conversational speech recognition
  • W. Byrne, M. Finke, +6 authors G. Zavaliagkos
  • Computer Science
  • Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181)
  • 1998
TLDR
It is demonstrated that the improvement in recognition performance from pronunciation modelling persists as the system is enhanced with better acoustic and language models. Expand
Speaking mode dependent pronunciation modeling in large vocabulary conversational speech recognition
TLDR
A framework for speaking mode dependent pronunciation modeling is presented and the framework is successfully applied to increase the performance of the state-of-the-art Janus Recognition Toolkit Switchboard recognizer. Expand
Pronunciation modelling for conversational speech recognition: a status report from WS97
TLDR
A use of hand-labelled phonetic transcriptions of a portion of the Switchboard corpus is illustrated, in conjunction with statistical techniques, to learn alternatives to canonical pronunciations of words. Expand
Word juncture modeling using phonological rules for HMM-based continuous speech recognition
TLDR
Results, which are evaluated on the 991-word speaker-independent DARPA task, show that phonological rules are effective in providing corrective capability at low computational cost. Expand
Dictionary learning for spontaneous speech recognition
  • T. Sloboda, A. Waibel
  • Computer Science
  • Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96
  • 1996
TLDR
This work proposed a data driven approach to add new pronunciations to a given phonetic dictionary in a way that they model the given occurrences of words in the database and shows how this algorithm can be extended to produce alternative pronunciation for word tuples and frequently misrecognized words. Expand
A statistical model for generating pronunciation networks
  • M. Riley
  • Computer Science
  • [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing
  • 1991
TLDR
A procedure is presented that builds decision trees, trained on the TIMIT database, using some of these features to predict pronunciation alternatives, and the resulting phonetic network predicts the correct pronunciation of a phoneme on test data from the same corpus approximately 83% of the time. Expand
Fabricating conversational speech data with acoustic models: a program to examine model-data mismatch
TLDR
There is a substantial mismatch between real speech and the combination of the authors' acoustic models and the pronunciations in their recognition dictionary, and the use of simulation appears to be a promising tool in the efforts to understand and reduce the size of this mismatch. Expand
Multiple-pronunciation lexical modeling in a speaker independent speech understanding system
TLDR
An algorithm for the construction of models that attempt to capture the variation that occurs in the pronunciations of words in spontaneous speech, which improves the performance of both the speech recognition and the speech understanding components of the BeRP system. Expand
...
1
2
3
4
5
...