Learn More
This article presents an overview of the shared task that took place as part of the TweetMT workshop held at SEPLN 2015. The task consisted in translating collections of tweets from and to several languages. The article outlines the data collection and annotation process, the development and evaluation of the shared task, as well as the results achieved by(More)
In this paper we introduce TweetNorm es, an annotated corpus of tweets in Spanish language, which we make publicly available under the terms of the CC-BY license. This corpus is intended for development and testing of microtext normalization systems. It was created for Tweet-Norm, a tweet normalization workshop and shared task, and is the result of a joint(More)
Language identification, as the task of determining the language a given text is written in, has progressed substantially in recent decades. However, three main issues remain still unresolved: (i) distinction of similar languages, (ii) detection of multilingualism in a single document, and (iii) identifying the language of short texts. In this paper, we(More)
This paper argues in favor of a linguistically-informed error classification for SMT to identify system weaknesses and map them to possible syntactic, semantic and structural fixes. We propose a scheme which includes both linguistic-oriented error categories as well as SMT-oriented edit errors, and evaluate an English-Spanish system and an English Basque(More)
The goal of this FP7 European project is to contribute for the advancement of quality machine translation by pursuing an approach that further relies on semantics, deep parsing and linked open data. 1 Summary QTLeap project (Quality Translation by Deep Language Engineering Approaches) is a collaborative project funded by the European Commission(More)
This work compares the post-editing productivity of professional translators and lay users. We integrate an English to Basque MT system within Bologna Translation Service, an endto-end translation management platform, and perform a producitivity experiment in a real working environment. Six translators and six lay users translate or post-edit two texts from(More)
We introduce TweetMT, a parallel corpus of tweets in four language pairs that combine five languages (Spanish from/to Basque, Catalan, Galician and Portuguese), all of which have an official status in the Iberian Peninsula. The corpus has been created by combining automatic collection and crowdsourcing approaches, and it is publicly available. It is(More)