Learn More
—This paper presents a new method to recognize machine-printed traditional Mongolian characters by using back-propagation (BP) neural networks. First, the set of traditional Mongolian characters is divided into five subsets according to each character's position (initial, medial or final) within a word and some steady structural features. Then, each subset(More)
In order to recognize and retrieve the Mongolian Kanjur images, lots of preprocessing tasks should be done. In this paper, we concentrate on the binarization of the Mongolian Kanjur images and we have proposed an efficient binarization method for them. The proposed method is applied to each image as follows: First, some preprocessing tasks including(More)
In this paper, we propose a keyword retrieval system for locating words in historical Mongolian document images. Based on the word spotting technology, a collection of historical Mongolian document images is converted into a collection of word images by word segmentation, and a number of profile-based features are extracted to represent word images. For(More)
—There are many classical Mongolian historical documents which are reserved in image form, and as a result it is difficult for us to explore and retrieve them. In this paper, we investigate the peculiarities of classical Mongolian documents and propose an approach to recognize the words in them. We design an algorithm to segment the Mongolian words into(More)
This paper describes our work in the subtask IR4QA. Our IR system designed for this task consists of two modules: (1) query processing; (2) indexing, retrieval and re-rank. We first study the method of question classification, and the strategies of weighting based on the result of question classification. Baidu and Wanfang resources are exploited to help(More)
This paper proposes a knowledge-based system to recognize historical Mongolian documents in which the words exhibit remarkable variation and character overlapping. According to the characteristics of Mongolian word formation, the system combines a holistic scheme and a segmentation-based scheme for word recognition. Several types of words and isolated(More)
This paper presents a two-stage binarization method for the Mongolian Kanjur images. The proposed method includes two stages. In the first stage, three popular global thresholding methods are used to remove the background regions from one gray-level Mongolian Kanjur image. In the second stage, the rest of regions of one gray-level Mongolian Kanjur image are(More)