Learn More
Tone has a crucial role in Mandarin speech in distinguishing ambiguous words. Most state-of-the-art Mandarin automatic speech recognition systems adopt embedded tone modeling, where tonal acoustic units are used and F 0 features are appended to the spectral feature vector. In this paper, we combine the embedded aproach (using improved F 0 smoothing) with(More)
Lack of data is a problem in training language models for conversational speech recognition, particularly for languages other than English. Experiments in English have successfully used web-based text collection targeted for a conversational style to augment small sets of transcribed speech; here we look at extending these techniques to Mandarin. In(More)
This paper explores techniques for utilizing untranscribed training data pools to increase the available training data for automatic speech recognition systems. It has been well established that current speech recognition technology, especially in Large Vocabulary Conversational Speech Recognition (LVCSR), is largely language independent, and that the(More)
This article describes a methodology for collecting text from the Web to match a target sublanguage both in style (register) and topic. Unlike other work that estimates n-gram statistics from page counts, the approach here is to select and filter documents, which provides more control over the type of material contributing to the n-gram counts. The data can(More)
The introduction of Aurora 4 tasks provides a standard database and methodology for comparing the effectiveness of different robust algorithms on LVCSR. One important issue on Aurora 4 tasks is the computation time involved in evaluating different test conditions. In this paper we show that by employing HTK as the recognition frontend and backend on Aurora(More)
This paper presents the 1997 BBN Byblos Large Vocabulary Speech Recognition (LVCSR) system. We give an outline of the algorithms and procedures used to train the system, describe the recognizer configuration and present the major technological innovations that lead to performance improvements. The major testbed we present our results for is the Switchboard(More)