Data Set Used
Statistical MT has made great progress in the last few years, but current translation models are weak on reordering and target language fluency. Syntactic approaches seek to remedy these problems. In this paper, we take the framework for acquiring multi-level syntactic translation rules of (Galley et al., 2004) from aligned tree-string pairs, and present… (More)
We use the Margin Infused Relaxed Algorithm of Crammer et al. to add a large number of new features to two machine translation systems: the Hiero hierarchical phrase-based translation system and our syntax-based translation system. On a large-scale Chinese-English translation task, we obtain statistically significant improvements of +1.5 B and +1.1 B,… (More)
We introduce SPMT, a new class of statistical Translation Models that use Syn-tactified target language Phrases. The SPMT models outperform a state of the art phrase-based baseline model by 2.64 Bleu points on the NIST 2003 Chinese-English test corpus and 0.28 points on a human-based quality metric that ranks translations on a scale from 1 to 5.
We compare and contrast the strengths and weaknesses of a syntax-based machine translation model with a phrase-based machine translation model on several levels. We briefly describe each model, highlighting points where they differ. We include a quantitative comparison of the phrase pairs that each model has to work with, as well as the reasons why some… (More)
This article shows that the structure of bilingual material from standard parsing and alignment tools is not optimal for training syntax-based statistical machine translation (SMT) systems. We present three modifications to the MT training data to improve the accuracy of a state-of-the-art syntax MT system: restructuring changes the syntactic structure of… (More)
We show that phrase structures in Penn Tree-bank style parses are not optimal for syntax-based machine translation. We exploit a series of binarization methods to restructure the Penn Treebank style trees such that syn-tactified phrases smaller than Penn Treebank constituents can be acquired and exploited in translation. We find that by employing the EM… (More)
We present a probabilistic bilingual capitalization model for capitalizing machine translation outputs using conditional random fields. Experiments carried out on three language pairs and a variety of experiment conditions show that our model significantly outperforms a strong mono-lingual capitalization model baseline, especially when working with small… (More)