Jose Ramón Navarro-Cerdán

Learn More
Quality estimation (QE) for machine translation is usually addressed as a regression problem where a learning model is used to predict a quality score from a (usually highly-redundant) set of features that represent the translation. This redundancy hinders model learning, and thus penalizes the performance of quality estimation systems. We propose different(More)
In this paper, an OCR post-processing method that combines a language model, OCR hypothesis information and an error model is proposed. The approach can be seen as a flexible and efficient way to perform Stochastic Error-Correcting Language Modeling. We use Weighted Finite-State Transducers (WFSTs) to represent the language model, the complete set of OCR(More)
In an automatic handwritten form processing system it is often necessary to use the lexical or linguistic restrictions present in the field contents in order to obtain acceptable recognition rates. Since each field is known to hold a given kind of information (name, address...), a language model can be defined for it. But, often, in a typical form there are(More)
In an OCR post-processing task, a language model is used to find the best transformation of the OCR hypothesis into a string compatible with the language. The cost of this transformation is used as a confidence value to reject the strings that are less likely to be correct, and the error rate of the accepted strings should be strictly controlled by the(More)
In this work, a method for the automatic estimation of a threshold that allows the user of an OCR system to define an expected error rate is presented. When the OCR output is post-processed using a language model, a probability, a reliability index (or a " transformation cost ") is usually obtained, reflecting the likelihood (or its inverse) that the string(More)
In this paper, a generic Symbol Input Correction Method for Human-Machine Interfaces, especially useful for embedded devices where the input subsystem is often size-constrained, combining a Language Model, the Input Hypothesis information and an Error Model is proposed. The approach can be seen as a flexible and efficient way to perform Stochastic(More)
We present a method to estimate the quality of automatic translations when reference translations are not available. Quality estimation is addressed as a two-step regression problem where multiple features are combined to predict a quality score. Given a set of features , we aim at automatically extracting the variables that better explain translation(More)
  • 1