• Corpus ID: 16458263

Pronunciation Modeling of Mandarin Casual Speech

  title={Pronunciation Modeling of Mandarin Casual Speech},
  author={Pascale Fung and William J. Byrne and Zheng Thomas and Teresa M. Kamm and Liu Yi and Song Zhanjiang and Veera Venkataramani and Umar Ruhi},

Model partial pronunciation variations for spontaneous Mandarin speech recognition

  • Yi Y. LiuPascale Fung
  • Physics
    7th International Conference on Spoken Language Processing (ICSLP 2002)
  • 2002
Modeling pronunciation variations is a critical part of spontaneous Mandarin speech recognition. Such variations include both complete changes and partial changes. Complete changes can usually be

Pronunciation Modeling for Spontaneous Mandarin Speech Recognition

It is shown that partial changes are much less clear-cut than previously assumed and cannot be modelled by mere representation by alternate phone units and can be applied to any automatic speech recognition system based on subword units.

State-dependent phonetic tied mixtures with pronunciation modeling for spontaneous speech recognition

A state-dependent phonetic tied-mixture model with variable codebook size that incorporates a state-level pronunciation model for better discrimination of phonetic and acoustic confusions, while reducing model complexity is proposed.

Modeling partial pronunciation variations for spontaneous Mandarin speech recognition

Towards Improved Assessment of Phonotactic Information for Automatic Language Identification

This investigation makes use of the CallHome corpus, based on the premise it provides a better representation for the style of discourse and channel conditions encountered in the conversational telephone speech (CTS), which is now the focus of current NIST LID evaluations.


Regional accents in Mandarin speech result mostly from partial phone changes due to the interlanguage system of non-native speakers. We propose partial change accent models based on accent-specific

Partial Change Phone Models for Pronunciation Variations in Spontaneous Mandarin Speech

The pre-trained acoustic model is reconstructed by sharing Gaussian mixtures between canonical phone models and partial change phone models at the state level and improves the resolution of the acoustic model to accommodate partial changes.

English-Chinese Name Machine Transliteration Using Search and Neural Network Models

It is found that search-based methods outperform deep learning ones, likely due to the relatively small number of English names with standard Chinese translations in the accessible dataset, and that incorporating syllable length heuristics and phonetic information into the search improves performance significantly.

Joint training methods for tandem and hybrid speech recognition systems using deep neural networks

Cambridge International Scholarship, Cambridge Overseas Trust Research funding, EPSRC Natural Speech Technology Project Research funding, DARPA BOLT Program Research funding, iARPA Babel Program

Reliable Accent-Specific Unit Generation With Discriminative Dynamic Gaussian Mixture Selection for Multi-Accent Chinese Speech Recognition

The proposed DGMS framework is able to cover more multi-accent changes, thus reduce some performance loss in pruned beam search, without increasing the model size of the original acoustic model set.

Pronunciation modeling by sharing gaussian densities across phonetic models

The incorporation of pronunciation models into acoustic model training in addition to recognition is described, showing a 1.7 % improvement in recognition accuracy on the Switchboard corpus is presented.

Automatic Generation of Detailed Pronunciation Lexicons

This work explores different ways of “spelling” a word in a speech recognizer’s lexicon and how to obtain those spellings and describes how these different pronunciations are obtained from text-to-speech systems and from procedures that build decision trees trained on phonetically-labeled corpora.

A Status Report from WS97

  • presented at IEEE Workshop on Automatic Speech Recognition and Understanding, Santa Barbara, CA, USA, 1997.
  • 1997

Pronunciation modeling for conversational speech recognition

This dissertation provides a fundamental and quantitative insight into pronunciation variability in spontaneous speech and demonstrates techniques for accommodating this variability within the framework of traditional automatic speech recognition systems that assume temporally non-overlapping phonetic segments.

Automatic Speech and Speaker Recognition: Advanced Topics

Automatic Speech and Speaker Recognition: Advanced Topics groups together in a single volume a number of important topics on speech and speaker recognition, topics which are of fundamental importance, but not yet covered in detail in existing textbooks.

Mandarin accent adaptation based on context-independent/context-dependent pronunciation modeling

An accent adaptation approach using pronunciation variation modeling technology for the Mandarin accent was proposed in this paper and the syllable recognition error rate was reduced 15% by context-independent SPVD, and 20% bycontext-dependent SPVD.

An application of SAMPA-c for standard Chinese

The result shows that the labeling system presented is suitable for Standard Chinese and is used in two corpora labeling.

Japanese document recognition based on interpolated n-gram model of character

A contextual postprocessing method using a trigram model of character for Japanese document recognition using a deleted interpolation method is described, and its advantage is revealed by practical experiments.

Statistically reliable deleted interpolation

  • N. KimC. Un
  • Computer Science
    IEEE Trans. Speech Audio Process.
  • 1997
This work proposes a statistically reliable deleted interpolation (DI) approach that attempts to piecewise linearly approximate the interpolating weight curve based on some reasoning concerned with statistical reliability of sample-based estimates.

The phonetic labeling on read and spontaneous discourse corpora

First the principles and conventions of transcription are presented, then these two speech styles are compared from phonetic and syntactic point of view, including the statistic results of different phonetic units got from the annotated corpora.