Kham Nguyen

Learn More
In this paper, we present a novel approach for morphological decomposition in large vocabulary Arabic speech recognition. It achieved low out-of-vocabulary (OOV) rate as well as high recognition accuracy in a state-of-the-art Arabic broadcast news transcription system. In this approach, the compound words are decomposed into stems and affixes in both(More)
In this paper, we describe the BBN 2007 Mandarin speech-to-text system developed for the GALE Evaluation 2007. In comparison to the BBN 2006 Mandarin system, we achieved 25% relative reduction in character error rate on the most important test sets. The utilization of all available training data provided the largest contribution to the improvement. The use(More)
This paper presents a set of experiments that we conducted in order to optimize the performance of an Arabic/English machine translation system on broadcast news and conversational speech data. Proper integration of speech-to-text (STT) and machine translation (MT) requires special attention to issues such as sentence boundary detection, punctuation, STT(More)
We show the progress for Arabic speech recognition by incorporating contextual information into the process of morphological decomposition. The new approach achieves lower out-of-vocabulary and word error rates when compared to our previous work, in which the morphological decomposition relies on word-level information only. We also describe how the(More)
The majority of state-of-the-art speech recognition systems make use of system combination. The combination approaches adopted have traditionally been tuned to minimising word error rates (WERs). In recent years there has been a growing interest in taking the output from speech recognition systems in one language and translating it into another. This paper(More)
In this paper, we present a method to extract probabilistic acoustic features by using the Adaptive Boosting algorithm (AdaBoost). We build phoneme Gaussian mixture classifiers, and use AdaBoost to enhance the classification performance. The outputs from AdaBoost are the posterior probabilities for each frame given all phonemes. Those posterior features are(More)
Citation Tim Ng et al. " Improved morphological decomposition for Arabic broadcast news transcription. Article is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use. The MIT Faculty has made this article openly available. Please share how this access benefits(More)
  • 1