• Publications
  • Influence
Online Large-Margin Training of Syntactic and Structural Translation Features
TLDR
This work explores the use of the MIRA algorithm of Crammer et al. as an alternative to MERT and shows that by parallel processing and exploiting more of the parse forest, it can obtain results using MIRA that match or surpass MERT in terms of both translation quality and computational cost.
Improved Statistical Machine Translation Using Monolingually-Derived Paraphrases
TLDR
This work presents what is to their knowledge the first successful integration of a collocational approach to untranslated words with an end-to-end, state of the art SMT system demonstrating significant translation improvements in a low-resource setting.
Soft Syntactic Constraints for Hierarchical Phrased-Based Translation
TLDR
This work presents an approach that explores the tradeoff between taking advantage of linguistic analysis, versus allowing the model to exploit linguistically unmotivated mappings learned from parallel training data, starting with a context-free translation model learned directly from aligned parallel text and then adding soft constituent-level constraints based on parses of the source language.
On Compression-Based Text Classification
TLDR
This work presents the results of a number of experiments designed to evaluate the effectiveness and behavior of different compression-based text classification methods on English text, and specifically designed to test whether the ability to capture non-word features causes character- based text compression methods to achieve more accurate classification.
Dependency Parsing of Modern Standard Arabic with Lexical and Inflectional Features
TLDR
It is shown that parsing quality in the predicted condition can dramatically improve by training in a combined gold+predicted condition, and the contribution of the linguistic knowledge in the tag sets and features identified goes beyond particular experimental settings, and may be informative for other parsers and morphologically rich languages.
Improving Arabic Dependency Parsing with Lexical and Inflectional Morphological Features
TLDR
It is shown that training the parser using a simple regular expressive extension of an impoverished POS tagset with high prediction accuracy does better than using a highly informative POS tag set with only medium prediction accuracy, although the latter performs best on gold input.
Estimating Semantic Distance Using Soft Semantic Constraints in Knowledge-Source – Corpus Hybrid Models
TLDR
A corpus-thesaurus hybrid method that uses soft constraints to generate word-senseaware distributional profiles (DPs) from coarser "concept DPs" and sense-unaware traditional word DPs and is superior to others on word-pair ranking by semantic distance.
Improved Arabic-to-English statistical machine translation by reordering post-verbal subjects for word alignment
TLDR
A novel method for leveraging VS information in SMT is proposed: it reorder VS constructions into pre-verbal (SV) order for word alignment, which significantly improves BLEU and TER scores, even on a strong large-scale baseline.
Improving Arabic-to-English Statistical Machine Translation by Reordering Post-Verbal Subjects for Alignment
TLDR
This work proposes to reorder post-verbal subject constructions into SV order for SMT word alignment only, which significantly improves BLEU and TER scores, even on a strong large-scale baseline and despite noisy parses.
Improving Arabic Dependency Parsing with Form-based and Functional Morphological Features
TLDR
It is the first time functional morphological features are used for Arabic NLP, and it is shown that functional gender and number and the related rationality feature improve over form-based features.
...
1
2
3
4
...