Learn More
This paper describes a methodology for semi-automatic grammar induction from unannotated corpora belonging to a restricted domain. The grammar contains both semantic and syntactic structures, which are conducive towards language understanding. Our work aims to ameliorate the reliance of grammar development on expert handcrafting or the availability of(More)
This paper describes the development of CU Corpora, a series of large-scale speech corpora for Cantonese. Can-tonese is the most commonly spoken Chinese dialect in Southern China and Hong Kong. CU Corpora are the first of their kind and intended to serve as an important infrastructure for the advancement of speech recognition and synthesis technologies for(More)
This paper describes the use of Belief Networks for mixed-initiative dialog modeling within the context of the CU FOREX system [1]. CU FOREX is a bilingual hotline for real-time foreign exchange inquiries. Presently, it supports two separate interaction modalities: a direct dialog (DD) interaction, which is system-initiated for novice users; as well as(More)
This paper presents recent extensions to our ongoing effort in developing speech recognition for automatic mispronunciation detection and diagnosis in the interlanguage of Chinese learners of English. We have developed a set of context-sensitive phono-logical rules based on cross-language (Cantonese versus En-glish) analysis which has also been validated(More)
H idden Markov models (HMMs) and Gaussian mixture models (GMMs) are the two most common types of acoustic models used in statistical parametric approaches for generating low-level speech waveforms from high-level symbolic inputs via intermediate acoustic feature sequences. However, these models have their limitations in representing complex, nonlinear(More)
A popular approach to dialogue management is based on a finite-state model, where user utterances trigger transitions between the dialogue states, and these states, in turn, determine the system's response. This paper describes an alternative dialogue planning algorithm based on the notion of filling in an electronic form, or " E-form. " Each slot has(More)
This paper describes our initial implementation of a system to provide worldwide weather information over the telephone. The information is gathered from several different sites on the Web, preprocessed, and cached locally into a relational database to make access both fast and selective. Our natural language tools, originally developed for processing user(More)
This work aims to derive salient mispronunciations made by Chinese (L1 being Cantonese) learners of English (L2 being American English) in order to support the design of pedagogical and remedial instructions. Our approach is grounded on the theory of language transfer and involves systematic phonological comparison between two languages to predict possible(More)