Sylvain Raybaud

Learn More
In this paper we present the system we submitted to the WMT12 shared task on Quality Estimation. Each translated sentence is given a score between 1 and 5. The score is obtained using several numerical or boolean features calculated according to the source and target sentences. We perform a linear regression of the feature space against scores in the range(More)
The Principal Component Analysis (PCA) is a data dimensionality reduction technique well-suited for processing data from sensor networks. It can be applied to tasks like compression, event detection, and event recognition. This technique is based on a linear transform where the sensor measurements are projected on a set of principal components. When sensor(More)
A machine translated sentence is seldom completely correct. Confidence measures are designed to detect incorrect words, phrases or sentences, or to provide an estimation of the probability of correctness. In this article we describe several word-and sentence-level confidence measures relying on different features: mutual information between words, n-gram(More)
A confidence measure is able to estimate the reliability of an hypothesis provided by a machine translation system. The problem of confidence measure can be seen as a process of testing : we want to decide whether the most probable sequence of words provided by the machine translation system is correct or not. In the following we describe several original(More)
We present in this paper a twofold contribution to Confidence Measures for Machine Translation. First, in order to train and test confidence measures, we present a method to automatically build corpora containing realistic errors. Errors introduced into reference translation simulate classical machine translation errors (word deletion and word(More)
We present S2TT, an integrated speech-to-text translation system based on POCKET-SPHINX and MOSES. It is compared to different baselines based on ANTS — the broadcast news transcription system developed at LORIA's Speech group, MOSES and Google's translation tools. A small corpus of reference transcriptions of broadcast news from the evaluation campaign(More)
  • 1