Data Set Used
In this work we present Cunei, a hybrid, open-source platform for machine translation that models each example of a phrase-pair at run-time and combines them in dynamic collections. This results in a flexible framework that provides consistent modeling and the use of non-local features.
The Cunei Machine Translation Platform is an open-source system for data-driven machine translation. Our platform is a synthesis of the traditional Example-Based MT (EBMT) and Statistical MT (SMT) paradigms. What makes Cunei unique is that it measures the relevance of each translation instance with a distance function. This distance function, represented as… (More)
Example Based Machine Translation (EBMT) is limited by the quantity and scope of its training data. Even with a reasonably large corpus, we will not have examples that cover everything we want to translate. This problem is especially severe in Arabic due to its rich morphology. We demonstrate a novel method that exploits the regular nature of Arabic… (More)
In this work I look at two different paradigms of Example-Based Machine Translation (EBMT). I combine the strengths of these two systems and build a new EBMT engine that combines sub-phrasal matching with structural templates. This synthesis results in higher translation quality and more graceful degradation, yielding 1.5% to 7.5% relative improvement in… (More)
The Cunei Machine Translation Platform is an open-source MT system designed to model instances of translation. One of the challenges to this approach is effective training. We describe two techniques that improve the training procedure and allow us to leverage the strengths of instance-based modeling. First, during training we approximate our model with a… (More)
Machine translation has advanced considerably in recent years, primarily due to the availability of larger datasets. However, one cannot rely on the availability of copious, high-quality bilingual training data. In this work, we improve upon the state-of-the-art in machine translation with an instance-based model that scores each instance of translation in… (More)
Acknowledgment: This work is supported, in part, by the Human Language Technology Center of Excellence. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the sponsor.
We describe how CERA, the Complex Event Recognition Architecture, was used to create a multimodal user interface for Skibbles, a memory game moderated by a mobile robot. We also announce the availability of open-source software that implements CERA, and how it can be used to build intelligent multimodal interfaces.
Machine translation has advanced considerably in recent years, but primarily due to the availability of larger data sets. Translation of low-frequency phrases and resource-poor languages is still a serious problem. In this work we explore a deeper integration of context, structure, and similarity within machine translation. Instead of modeling phrase pairs… (More)