Learn More
Automatic word alignment plays a critical role in statistical machine translation. Unfortunately the relationship between alignment quality and statistical machine translation performance has not been well understood. In the recent literature the alignment task has frequently been decoupled from the translation task, and assumptions have been made about(More)
We describe a methodology for rapid experimentation in statistical machine translation which we use to add a large number of features to a baseline system exploiting features from a wide range of levels of syntactic representation. Feature values were combined in a log-linear model to select the highest scoring candidate translation from an n-best list.(More)
Word alignment is the problem of annotating parallel text with translational correspondence. Previous generative word alignment models have made structural assumptions such as the 1-to-1, 1-toN , or phrase-based consecutive word assumptions, while previous discriminative models have either made such an assumption directly or used features derived from a(More)
We present a novel machine translation model which models translation by a linear sequence of operations. In contrast to the " N-gram " model, this sequence includes not only translation but also reordering operations. Key ideas of our model are (i) a new reordering approach which better restricts the position to which a word or phrase can be moved, and is(More)
The phrase-based and N-gram-based SMT frameworks complement each other. While the former is better able to memorize , the latter provides a more principled model that captures dependencies across phrasal boundaries. Some work has been done to combine insights from these two frameworks. A recent successful attempt showed the advantage of using phrase-based(More)
We address the problem of unsupervised and language-pair independent alignment of symmetrical and asymmetrical parallel corpora. Asymmetrical parallel corpora contain a large proportion of 1-to-0/0-to-1 and 1-to-many/many-to-1 sentence correspondences. We have developed a novel approach which is fast and allows us to achieve high accuracy in terms of F 1(More)
N-gram-based models co-exist with their phrase-based counterparts as an alternative SMT framework. Both techniques have pros and cons. While the N-gram-based framework provides a better model that captures both source and target contexts and avoids spurious phrasal segmentation, the ability to memorize and produce larger translation units gives an edge to(More)
Many different types of features have been shown to improve accuracy in parse reranking. A class of features that thus far has not been considered is based on a projection of the syntactic structure of a translation of the text to be parsed. The intuition for using this type of bitext projection feature is that ambiguous structures in one language often(More)