Learn More
We present the results of an experiment on extending the automatic method of Machine Translation evaluation BLUE with statistical weights for lexical items, such as tf.idf scores. We show that this extension gives additional information about evaluated texts; in particular it allows us to measure translation Adequacy, which, for statistical MT systems, is(More)
The extraction of dictionaries from parallel text corpora is an established technique. However, as parallel corpora are a scarce resource, in recent years the extraction of dictionaries using comparable corpora has obtained increasing attention. In order to find a mapping between languages, almost all approaches suggested in the literature rely on a seed(More)
Lack of sufficient parallel data for many languages and domains is currently one of the major obstacles to further advancement of automated translation. The ACCURAT project is addressing this issue by researching methods how to improve machine translation systems by using comparable corpora. In this paper we present tools and techniques developed in the(More)
In this paper we present and evaluate three approaches to measure comparability of documents in non-parallel corpora. We develop a task-oriented definition of comparability , based on the performance of automatic extraction of translation equivalents from the documents aligned by the proposed metrics, which formalises intuitive definitions of comparability(More)
In this paper we present a tool that uses comparable corpora to find appropriate translation equivalents for expressions that are considered by translators as difficult. For a phrase in the source language the tool identifies a range of possible expressions used in similar contexts in target language corpora and presents them to the translator as a list of(More)
We present the design and evaluation of a translator's amenuensis that uses comparable corpora to propose and rank non-literal solutions to the translation of expressions from the general lexicon. Using dis-tributional similarity and bilingual dictionaries , the method outperforms established techniques for extracting translation equivalents from parallel(More)
In this paper we present a metric that measures comparability of documents across different languages. The metric is developed within the FP7 ICT ACCURAT project, as a tool for aligning comparable corpora on the document level; further these aligned comparabl e documents are used for phrase alignment and extraction of translation equivalents, with the aim(More)
The output of state-of-the-art machine translation (MT) systems could be useful for certain NLP tasks, such as Information Extraction (IE). However, some unresolved problems in MT technology could seriously limit the usability of such systems. For example robust and accurate word sense disambiguation, which is essential for the performance of IE systems, is(More)
Automatic methods for MT evaluation are often based on the assumption that MT quality is related to some kind of distance between the evaluated text and a professional human translation (e.g., an edit distance or the precision of matched N-grams). However, independently produced human translations are necessarily different, conveying the same content by(More)