Learn More
Production of parallel training corpora for the development of statistical machine translation (SMT) systems for resource-poor languages usually requires extensive manual effort. Active sample selection aims to reduce the labor , time, and expense incurred in producing such resources, attaining a given performance benchmark with the smallest possible(More)
We present a novel architecture for providing automated telephone Directory Assistance (DA). The architecture couples a large-vocabulary, statistical n-gram, speech recognition engine with a statistical retrieval system. The use of a statistical n-gram allows for the recognition of unconstrained spoken queries while the statistical retrieval engine allows(More)
In this paper we discuss the design and performance of the BBN Call Director product for automatic call routing and the methodology for its deployment. The component technologies for the BBN Call Director are a statistical n-gram speech recognizer and a statistical topic identification system that, together, provide the framework for processing natural(More)
Offline handwriting recognition of free-flowing Arabic text is a challenging task due to the plethora of factors that contribute to the variability in the data. In this paper, we address some of these sources of variability, and present experimental results on a large corpus of handwritten documents. Specific techniques such as the application of(More)
Many feature extraction approaches for off-line handwriting recognition (OHR) rely on accurate binarization of gray-level images. However, high-quality binarization of most real-world documents is extremely difficult due to varying characteristics of noises artifacts common in such documents. Unlike most of these features, Gabor features do not require(More)