Sylvain Raybaud

Learn More
The Principal Component Analysis (PCA) is a data dimensionality reduction technique well-suited for processing data from sensor networks. It can be applied to tasks like compression, event detection, and event recognition. This technique is based on a linear transform where the sensor measurements are projected on a set of principal components. When sensor(More)
A machine translated sentence is seldom completely correct. Confidence measures are designed to detect incorrect words, phrases or sentences, or to provide an estimation of the probability of correctness. In this article we describe several wordand sentence-level confidence measures relying on different features: mutual information between words, n-gram and(More)
In this paper we present the system we submitted to the WMT12 shared task on Quality Estimation. Each translated sentence is given a score between 1 and 5. The score is obtained using several numerical or boolean features calculated according to the source and target sentences. We perform a linear regression of the feature space against scores in the range(More)
A confidence measure is able to estimate the reliability of an hypothesis provided by a machine translation system. The problem of confidence measure can be seen as a process of testing : we want to decide whether the most probable sequence of words provided by the machine translation system is correct or not. In the following we describe several original(More)
Machine translation systems are not reliable enough to be used “as is”: except for the most simple tasks, they can only be used to grasp the general meaning of a text or assist human translators. The purpose of confidence measures is to detect erroneous words or sentences produced by a machine translation system. In this article, after reviewing the(More)
We present in this paper a twofold contribution to Confidence Measures for Machine Translation. First, in order to train and test confidence measures, we present a method to automatically build corpora containing realistic errors. Errors introduced into reference translation simulate classical machine translation errors (word deletion and word(More)
We present S2TT, an integrated speech-totext translation system based on POCKETSPHINX and MOSES. It is compared to different baselines based on ANTS — the broadcast news transcription system developed at LORIA’s Speech group, MOSES and Google’s translation tools. A small corpus of reference transcriptions of broadcast news from the evaluation campaign(More)
  • 1