Learn More
This paper presents an eigenspace-based fast speaker adaptation approach which can improve the modeling accuracy of the conventional maximum likelihood linear regression (MLLR) techniques when only very limited adaptation data is available. The proposed eigenspace-based MLLR approach was developed by introducing a priori knowledge analysis on the training(More)
This paper presents an entropy-based algorithm for accurate and robust endpoint detection for speech recognition under noisy environments. Instead of using the conventional energy-based features, the spectral entropy is developed to identify the speech segments accurately. Experimental results show that this algorithm outperforms the energy-based algorithms(More)
Cepstral mean subtraction (CMS) and cepstral normalization (CN) have been popularly used to normalize the first and the second moments of cepstral coefficients, and proved to be very helpful for robust speech recognition (Furui, S. 1981; Viikki, O. and Laurila, K., 1998). A unified formulation for higher order cepstral moment normalization (HOCMN) is(More)
In many applications, Chinese information is very often provided in the form of phonetic symbol sequences, and it is desired to decode such sequences into the corresponding Chinese character sequences (sentences) as the output. Phonetic input of Chinese characters into computers is a typical example. The problem is due primarily to the high degree of(More)
Techniques for unsupervised discovery of acoustic patterns are getting increasingly attractive, because huge quantities of speech data are becoming available but manual annotations remain hard to acquire. In this paper, we propose an approach for unsupervised discovery of linguistic structure for the target spoken language given raw speech data. This(More)
With the rapidly growing use of the audio and multimedia information over the Internet, the technology for retrieving speech information using voice queries is becoming more and more important. In this paper, considering the monosyllabic structure of the Chinese language, a whole class of syllable-based indexing features, including overlapping segments of(More)
This correspondence presents the first known results of complete recognition of continuous Mandarin speech for the Chinese language with very large vocabulary but very limited training data. Various acoustic and linguistic processing techniques were developed, and a prototype system of a continuous speech Mandarin dictation machine has been successfully(More)
Error pattern detection is very helpful in Computer-Aided Pronunciation Training (CAPT). This paper reports the work of modeling and detecting Error Patterns defined by language teachers based on their linguist knowledge and pedagogical experiences. We develop a model generation framework to create the Error Pattern models from existing phoneme models. We(More)
This paper presents a set of corpus-based text-to-speech synthesis technologies for Mandarin Chinese. A large speech corpus produced by a single speaker is used, and the speech output is synthesized from waveform units of variable lengths, with desired linguistic properties, retrieved from this corpus. Detailed methodologies were developed for designing(More)