Sanjika Hewavitharana

Learn More
This paper describes a method to recognize off-line handwritten Sinhala characters, the language used by the majority of Sri Lanka. The classification approach is based on discrete hidden Markov models. A subset of the Sinhala alphabet was chosen for the study. The unknown characters are first pre-classified into one of three character groups, based on the(More)
This article presents a database of images of handwritten city names. The aim is to provide a standard database for Sinhala handwriting recognition research. This database contains about 15,000 images of about 500 city names of Sri Lanka. These images are obtained from the addresses of live mail so that the writers had no idea that they would be used for(More)
We describe a translation model adaptation approach for conversational spoken language translation (CSLT), which encourages the use of contextually appropriate translation options from relevant training conversations. Our approach employs a monolingual LDA topic model to derive a similarity measure between the test conversation and the set of training(More)
In this paper we describe the components of our statistical machine translation system used for the spoken language translation evaluation campaign. This system is based on phrase-to-phrase translations extracted from a bilingual corpus. A new phrase alignment approaches will be introduced , which finds the target phrase by optimizing the overall(More)
In this paper we present the results for building a grapheme-based speech recognition system for Thai. We experiment with different settings for the initial context independent system, different number of acoustic models and different contexts for the speech unit. In addition, we investigate the potential of an enhanced tree clustering method as a way of(More)
1 We describe a novel two-way speech-to-speech (S2S) translation system that actively detects a wide variety of common error types and resolves them through user-friendly dialog with the user(s). We present algorithms for detecting out-of-vocabulary (OOV) named entities and terms, sense ambiguities, homophones, idioms, ill-formed input, etc. and discuss(More)
This paper describes the UKA/CMU statistical machine translation system used in the IWSLT 2006 evaluation campaign. The system is based on phrase-to-phrase translations extracted from a bilingual corpus. We compare two different phrase alignment techniques both based on word alignment probabilities. The system was used for all language pairs and data(More)
Email is one of the most prevalent communication tools today, and solving the email overload problem is pressingly urgent. A good way to alleviate email overload is to automatically prioritize received messages f1ording to the priorities of each user. However, research on statistical learning methods for fully personalized email prioritization has been(More)