Learn More
We report the results of our experiments in the context of the NEWS 2015 Shared Task on Transliteration. We focus on methods of combining multiple base systems , and leveraging transliterations from multiple languages. We show error reductions over the best base system of up to 10% when using supplemental translitera-tions, and up to 20% when using system(More)
Sentiment analysis research has predominantly been on English texts. Thus there exist many sentiment resources for English, but less so for other languages. Approaches to improve sentiment analysis in a resource-poor focus language include: (a) translate the focus language text into a resource-rich language such as English, and apply a powerful English(More)
We present a shared task on automatically determining sentiment intensity of a word or a phrase. The words and phrases are taken from three domains: general English, English Twit-ter, and Arabic Twitter. The phrases include those composed of negators, modals, and degree adverbs as well as phrases formed by words with opposing polarities. For each of the(More)
When text is translated from one language into another, sentiment is preserved to varying degrees. In this paper, we use Arabic social media posts as stand-in for source language text, and determine loss in sentiment predictability when they are translated into En-glish, manually and automatically. As benchmarks , we use manually and automatically(More)
We apply Support Vector Machines to differentiate between 11 native languages in the 2013 Native Language Identification Shared Task. We expand a set of common language identification features to include cognate interference and spelling mistakes. Our best results are obtained with a classifier which includes both the cognate and the misspelling features,(More)
Morphological segmentation is an effective sparsity reduction strategy for statistical machine translation (SMT) involving morphologically complex languages. When translating into a segmented language , an extra step is required to deseg-ment the output; previous studies have de-segmented the 1-best output from the de-coder. In this paper, we expand our(More)
Existing Arabic sentiment lexicons have low coverage—only a few thousand entries. In this paper, we present several large sentiment lexicons that were automatically generated using two different methods: (1) by using distant supervision techniques on Arabic tweets, and (2) by translating English sentiment lexicons into Arabic using a freely available(More)
Morphological segmentation is an effective strategy for addressing difficulties caused by morphological complexity. In this study, we use an English-to-Arabic test bed to determine what steps and components of a phrase-based statistical machine translation pipeline benefit the most from segmenting the target language. We test several scenarios that differ(More)
Morphological tokenization has been used in machine translation for morphologically complex languages to reduce lexical sparsity. Unfortunately, when translating into a morphologically complex language, recombining segmented tokens to generate original word forms is not a trivial task, due to morphological , phonological and orthographic adjustments that(More)