Learn More
Many Word Sense Disambiguation (WSD) algorithms do not take into account the morphological variations in the language. However, as Indian languages are highly inflected languages, we investigate whether morphology must be taken into account for WSD for Indian languages, as they are very rich in morphology. This paper analyses the influence of morphology in(More)
In this paper, we propose a novel finetuning algorithm for the recently introduced multi-way, multilingual neural machine translate that enables zero-resource machine translation. When used together with novel many-to-one translation strategies, we empirically show that this finetuning algorithm allows the multi-way, multilingual model to translate a(More)
We present a universal Parts-of-Speech (POS) tagset framework covering most of the Indian languages (ILs) following the hierarchical and decomposable tagset schema. In spite of significant number of speakers, there is no workable POS tagset and tagger for most ILs, which serve as fundamental building blocks for NLP research. Existing IL POS tagsets are(More)
Statistical machine translation is often faced with the problem of combining training data from many diverse sources into a single translation model which then has to translate sentences in a new domain. We propose a novel approach, ensemble decoding, which combines a number of translation systems dynamically at the decoding step. In this paper, we evaluate(More)
This paper describes Kriya – a new statistical machine translation (SMT) system that uses hierarchical phrases, which were first introduced in the Hiero machine translation system (Chi-ang, 2007). Kriya supports both a grammar extraction module for synchronous context-free grammars (SCFGs) and a CKY-based decoder. There are several re-implementations of(More)
Although it has been always thought that Word Sense Dis-ambiguation (WSD) can be useful for Machine Translation, only recently efforts have been made towards integrating both tasks to prove that this assumption is valid, particularly for Statistical Machine Translation (SMT). While different approaches have been proposed and results started to converge in a(More)
Research in Parts-of-Speech (POS) tagset design for European and East Asian languages started with a mere listing of important morphosyntactic features in one language and has matured in later years towards hierarchical tagsets, decomposable tags, common framework for multiple languages (EAGLES) etc. Several tagsets have been developed in these languages(More)
Simultaneous translation is the challenging task of listening to source language speech, and at the same time, producing target language speech. Human interpreters achieve this task routinely and effortlessly , using different strategies in order to minimize the latency in producing target language. Toward modeling the human interpretation process, we(More)
In this paper we focus on the incremental decoding for a statistical phrase-based machine translation system. In incremental decoding, translations are generated incre-mentally for every word typed by a user, instead of waiting for the entire sentence as input. We introduce a novel modification to the beam-search decoding algorithm for phrase-based MT to(More)
We present a novel approach for extracting a minimal synchronous context-free grammar (SCFG) for Hiero-style statistical machine translation using a non-parametric Bayesian framework. Our approach is designed to extract rules that are licensed by the word alignments and heuristically extracted phrase pairs. Our Bayesian model limits the number of SCFG rules(More)