Sanjika Hewavitharana

Learn More
We describe a translation model adaptation approach for conversational spoken language translation (CSLT), which encourages the use of contextually appropriate translation options from relevant training conversations. Our approach employs a monolingual LDA topic model to derive a similarity measure between the test conversation and the set of training(More)
In this paper we describe the components of our statistical machine translation system used for the spoken language translation evaluation campaign. This system is based on phrase-to-phrase translations extracted from a bilingual corpus. A new phrase alignment approaches will be introduced, which finds the target phrase by optimizing the overall(More)
This paper describes a system to recognize handwritten Tamil characters using a two-stage classification approach, for a subset of the Tamil alphabet. In the first stage, an unknown character is pre-classified into one of the three groups: core, ascending and descending characters. Then, in the second stage, members of the pre-classified group are further(More)
Named entities (NE), the noun or noun phrases referring to persons, locations and organizations, are among the most informationbearing linguistic structures. Extracting and translating named entities benefits many natural language processing problems such as cross-lingual information retrieval, cross-lingual question answering and machine translation. In(More)
We describe a novel two-way speech-to-speech (S2S) translation system that actively detects a wide variety of common error types and resolves them through user-friendly dialog with the user(s). We present algorithms for detecting out-of-vocabulary (OOV) named entities and terms, sense ambiguities, homophones, idioms, ill-formed input, etc. and discuss(More)
In this paper, we present a translation memory (TM) based system to augment a statistical translation (SMT) system. It is used for translating sentences which have close matches in the training corpus. Given a test sentence, we first extract sentence pairs from the training corpus, whose source side is similar to the test sentence. Then, the TM system(More)
Supervised learning algorithms for identifying comparable sentence pairs from a dominantly non-parallel corpora require resources for computing feature functions as well as training the classifier. In this paper we propose active learning techniques for addressing the problem of building comparable data for low-resource languages. In particular we propose(More)
In this paper we present the results for building a grapheme-based speech recognition system for Thai. We experiment with different settings for the initial context independent system, different number of acoustic models and different contexts for the speech unit. In addition, we investigate the potential of an enhanced tree clustering method as a way of(More)
This paper describes a method to recognize off-line handwritten Sinhala characters, the language used by the majority of Sri Lanka. The classification approach is based on discrete hidden Markov models. A subset of the Sinhala alphabet was chosen for the study. The unknown characters are first pre-classified into one of three character groups, based on the(More)