Felipe Sánchez-Martínez

Learn More
Apertium is a free/open-source platform for rule-based machine translation. It is being widely used to build machine translation systems for a variety of language pairs, especially in those cases (mainly with related-language pairs) where shallow transfer suffices to produce good quality translations, although it has also proven useful in assimilation(More)
We present the current status of development of an open-source shallow-transfer machine translation engine for the Romance languages of Spain (the main ones being Spanish, Catalan and Galician) as part of a larger government-funded project which includes non-Romance languages such as Basque and involving both universities and linguistic technology(More)
This paper describes the current status of development of an open-source shallow-transfer machine translation (MT) system for the [European] Portuguese ↔ Spanish language pair, developed using the OpenTrad Apertium MT toolbox (www.apertium.org). Apertium uses finite-state transducers for lexical processing, hidden Markov models for part-of-speech tagging,(More)
Large bilingual parallel texts (also known as bitexts) are usually stored in a compressed form, and previous work has shown that they can be more efficiently compressed if the fact that the two texts are mutual translations is exploited. For example, a bitext can be seen as a sequence of biwords —pairs of parallel words with a high probability of(More)
This paper describes the resources available in the Apertium platform, a free/open-source framework for creating rule-basedmachine translation systems. Resources within the platform take the form of finite-state morphologies for morphological analysis and generation, bilingual transfer lexica, probabilistic part-of-speech taggers and transfer rule files,(More)
A bitext, or bilingual parallel corpus, consists of two texts, each one in a different language, that are mutual translations. Bitexts are very useful in linguistic engineering because they are used as source of knowledge for different purposes. In this paper we propose a strategy to efficiently compress and use bitexts, saving, not only space, but also(More)
By the time Machine Translation Summit X is held in September 2005, our group will have released an open-source machine translation toolbox as part of a large government-funded project involving four universities and three linguistic technology companies from Spain. The machine translation toolbox, which will most likely be released under a GPL-like license(More)
This paper describes a method for the automatic inference of structural transfer rules to be used in a shallow-transfer machine translation (MT) system from small parallel corpora. The structural transfer rules are based on alignment templates, like those used in statistical MT. Alignment templates are extracted from sentence-aligned parallel corpora and(More)
In this paper we describe a module (rule formalism, rule compiler and rule processor) designed to provide flexible support for lexical selection in rule-based machine translation. The motivation and implementation for the system is outlined and an efficient algorithm to compute the best coverage of lexical-selection rules over an ambiguous input sentence is(More)
This paper describes a new method for crosslingual textual entailment (CLTE) detection based on machine translation (MT). We use sub-segment translations from different MT systems available online as a source of crosslingual knowledge. In this work we describe and evaluate different features derived from these sub-segment translations, which are used by a(More)