Sanjika Hewavitharana

Learn More
This paper describes a method to recognize off-line handwritten Sinhala characters, the language used by the majority of Sri Lanka. The classification approach is based on discrete hidden Markov models. A subset of the Sinhala alphabet was chosen for the study. The unknown characters are first pre-classified into one of three character groups, based on the(More)
We describe a translation model adaptation approach for conversational spoken language translation (CSLT), which encourages the use of contextually appropriate translation options from relevant training conversations. Our approach employs a monolingual LDA topic model to derive a similarity measure between the test conversation and the set of training(More)
This article presents a database of images of handwritten city names. The aim is to provide a standard database for Sinhala handwriting recognition research. This database contains about 15,000 images of about 500 city names of Sri Lanka. These images are obtained from the addresses of live mail so that the writers had no idea that they would be used for(More)
In this paper, we present a translation memory (TM) based system to augment a statistical translation (SMT) system. It is used for translating sentences which have close matches in the training corpus. Given a test sentence, we first extract sentence pairs from the training corpus, whose source side is similar to the test sentence. Then, the TM system(More)
In this paper we describe the components of our statistical machine translation system used for the spoken language translation evaluation campaign. This system is based on phrase-to-phrase translations extracted from a bilingual corpus. A new phrase alignment approaches will be introduced , which finds the target phrase by optimizing the overall(More)
In this paper we present the results for building a grapheme-based speech recognition system for Thai. We experiment with different settings for the initial context independent system, different number of acoustic models and different contexts for the speech unit. In addition, we investigate the potential of an enhanced tree clustering method as a way of(More)
1 We describe a novel two-way speech-to-speech (S2S) translation system that actively detects a wide variety of common error types and resolves them through user-friendly dialog with the user(s). We present algorithms for detecting out-of-vocabulary (OOV) named entities and terms, sense ambiguities, homophones, idioms, ill-formed input, etc. and discuss(More)
In this paper we describe the CMU statistical machine translation system used in the IWSLT 2005 evaluation campaign. This system is based on phrase-to-phrase translations extracted from a bilingual corpus. We experimented with two different phrase extraction methods; PESA on-the-fly phrase extraction and alignment free extraction method. The translation(More)