Learn More
This paper presents a phrase-based statistical machine translation method, based on non-contiguous phrases, i.e. phrases with gaps. A method for producing such phrases from a word-aligned corpora is proposed. A statistical translation model is also presented that deals such phrases, as well as a training method based on the maximization of translation(More)
This paper addresses the task of handling unknown terms in SMT. We propose using source-language monolingual models and resources to paraphrase the source text prior to translation. We further present a conceptual extension to prior work by allowing translations of entailed texts rather than paraphrases only. A method for performing this process efficiently(More)
We describe a dataset containing 16,000 translations produced by four machine translation systems and manually annotated for quality by professional translators. This dataset can be used in a range of tasks assessing machine translation evaluation metrics, from basic correlation analysis to training and test of machine learning-based metrics. By providing a(More)
In many languages the use of compound words is very productive. A common practice to reduce sparsity consists in splitting compounds in the training data. When this is done, the system incurs the risk of translating components in non-consecutive positions, or in the wrong order. Furthermore, a post-processing step of compound merging is required to(More)
We present a general model for PP attachment resolution and NP analysis in French. We make explicit the different assumptions our model relies on, and show how it generalizes previously proposed models. We then present a series of experiments conducted on a corpus of newspaper articles, and assess the various components of the model, as well as the(More)
This paper describes the algorithms implemented by the KerMIT consortium for its participation in the Trec 2002 Filtering track. The consortium submitted runs for the routing task using a linear SVM, for the batch task using the same SVM in combination with an innovative threshold-selection mechanism , and for the adaptive task using both a second-order(More)
We describe an approach for filtering phrase tables in a Statistical Machine Translation system , which relies on a statistical independence measure called Noise, first introduced in (Moore, 2004). While previous work by (Johnson et al., 2007) also addressed the question of phrase table filtering, it relied on a simpler independence measure, the p-value,(More)
Attending recent computational linguistics conferences, it is hard to ignore the phenomenal amount of research devoted to statistical machine translation (SMT). Driven by the wide availability of open-source translation systems, corpora, and evaluation tools, a research area that was once the preserve of large research groups has become accessible to those(More)