Learn More
Tone has a crucial role in Mandarin speech in distinguishing ambiguous words. Most state-of-the-art Mandarin automatic speech recognition systems adopt embedded tone modeling, where tonal acoustic units are used and F 0 features are appended to the spectral feature vector. In this paper, we combine the embedded aproach (using improved F 0 smoothing) with(More)
PLASER is a multimedia tool with instant feedback designed to teach English pronunciation for high-school students of Hong Kong whose mother tongue is Cantonese Chinese. The objective is to teach correct pronunciation and not to assess a student's overall pronunciation quality. Major challenges related to speech recognition technology include: allowance for(More)
This paper explores techniques for utilizing untranscribed training data pools to increase the available training data for automatic speech recognition systems. It has been well established that current speech recognition technology, especially in Large Vocabulary Conversational Speech Recognition (LVCSR), is largely language independent, and that the(More)
Lack of data is a problem in training language models for conversational speech recognition, particularly for languages other than English. Experiments in English have successfully used web-based text collection targeted for a conversational style to augment small sets of transcribed speech; here we look at extending these techniques to Mandarin. In(More)
Current state-of-the-art statistical speech recognition systems use hidden Markov models HMM for modeling the speech signal. However, it is well known that HMM's do not exploit the time-dependence in the speech process , since they are limited by the assumption of conditional independence of observations given the state sequence. Alternative techniques,(More)
This paper describes our approach to the estimation of conndence in the words generated by a speech recognition system. We describe the models and the features employed for conndence estimation. In addition we discuss the characteristics of an information -theoretic metric for assessing the performance of the conndence measure. We provide a simple(More)
This article describes a methodology for collecting text from the Web to match a target sublanguage both in style (register) and topic. Unlike other work that estimates n-gram statistics from page counts, the approach here is to select and filter documents, which provides more control over the type of material contributing to the n-gram counts. The data can(More)
The introduction of Aurora 4 tasks provides a standard database and methodology for comparing the effectiveness of different robust algorithms on LVCSR. One important issue on Aurora 4 tasks is the computation time involved in evaluating different test conditions. In this paper we show that by employing HTK as the recognition frontend and backend on Aurora(More)