Felipe Sánchez-Martínez

Learn More
This paper describes a method for the automatic inference of structural transfer rules to be used in a shallow-transfer machine translation (MT) system from small parallel corpora. The structural transfer rules are based on alignment templates, like those used in statistical MT. Alignment templates are extracted from sentence-aligned parallel corpora and(More)
We compare different strategies to apply statistical machine translation techniques in order to retrieve documents which are a plausible translation of a given source document. Finding the translated version of a document is a relevant task, for example, when building a corpus of parallel texts that can help to create and to evaluate new machine translation(More)
Large bilingual parallel texts (also known as bitexts) are usually stored in a compressed form, and previous work has shown that they can be more efficiently compressed if the fact that the two texts are mutual translations is exploited. For example, a bitext can be seen as a sequence of biwords —pairs of parallel words with a high probability of(More)
In this paper we present a new and simple method for using sources of bilingual information for word alignment between parallel segments of text. This method can be used on the fly, since it does not need to be trained. In addition, it can also be applied on comparable corpora. We compare our method to the state-of-the-art tool GIZA++, widely used for word(More)
This paper describes the current status of development of an open-source shallow-transfer machine translation (MT) system for the [European] Portuguese ↔ Spanish language pair, developed using the OpenTrad Apertium MT toolbox (www.apertium.org). Apertium uses finite-state transducers for lexical processing, hidden Markov models for part-of-speech tagging,(More)
A bitext, or bilingual parallel corpus, consists of two texts, each one in a different language, that are mutual translations. Bitexts are very useful in linguistic engineering because they are used as source of knowledge for different purposes. In this paper we propose a strategy to efficiently compress and use bitexts, saving, not only space, but also(More)
The amount of information that is stored in digital form in more than one language is growing very fast as a consequence of the globalization. Furthermore, there are countries and supra-national entities whose legislation enforces the translation (and storage) of all the official texts into all their official languages. Two texts that are mutual(More)
When automatically translating between related languages, one of the main sources of machine translation errors is the incorrect resolution of part-of-speech (PoS) ambiguities. Hidden Markov models (HMM) are the standard statistical approach to try to properly resolve such ambiguities. The usual training algorithms collect statistics from source-language(More)
This paper describes a new method for cross-lingual textual entailment (CLTE) detection based on machine translation (MT). We use sub-segment translations from different MT systems available online as a source of cross-lingual knowledge. In this work we describe and evaluate different features derived from these sub-segment translations, which are used by a(More)