Learn More
This paper describes a methodology for semi-automatic grammar induction from unannotated corpora belonging to a restricted domain. The grammar contains both semantic and syntactic structures, which are conducive towards language understanding. Our work aims to ameliorate the reliance of grammar development on expert handcrafting or the availability of(More)
We have developed a technique for automatic transliteration of named entities for English-Chinese cross-language spoken document retrieval (CL-SDR). Our retrieval system integrates machine translation, speech recognition and information retrieval technologies. An English news story forms a textual query that is automatically translated into Chinese words,(More)
This paper describes the development of CU Corpora, a series of large-scale speech corpora for Cantonese. Can-tonese is the most commonly spoken Chinese dialect in Southern China and Hong Kong. CU Corpora are the first of their kind and intended to serve as an important infrastructure for the advancement of speech recognition and synthesis technologies for(More)
This paper describes the use of Belief Networks for mixed-initiative dialog modeling within the context of the CU FOREX system [1]. CU FOREX is a bilingual hotline for real-time foreign exchange inquiries. Presently, it supports two separate interaction modalities: a direct dialog (DD) interaction, which is system-initiated for novice users; as well as(More)
Hidden Markov models (HMMs) and Gaussian mixture models (GMMs) are the two most common types of acoustic models used in statistical parametric approaches for generating low-level speech waveforms from high-level symbolic inputs via intermediate acoustic feature sequences. However, these models have their limitations in representing complex, nonlinear(More)
In this paper, we present an effective method to detect the language boundary (LB) in code-switching utterances. The utterances are mainly produced in Cantonese, a commonly used Chinese dialect, whilst occasionally English words are inserted between Cantonese words. Bi-phone probabilities are calculated to measure the confidence that the recognized phones(More)
A popular approach to dialogue management is based on a finite-state model, where user utterances trigger transitions between the dialogue states, and these states, in turn, determine the system's response. This paper describes an alternative dialogue planning algorithm based on the notion of filling in an electronic form, or " E-form. " Each slot has(More)