Liangrui Peng

Learn More
The research on offline handwritten Arabic character recognition has received more and more attention in recent years, because of the increasing needs of Arabic document digitization. The variation in Arabic handwriting brings great difficulty in character segmentation and recognition, eg., the subparts (diacritics) of the Arabic character may shift away(More)
Writer recognition is a very important branch of biometrics. In our previous research, a Grid Micro-structure Feature (GMSF) based text-independent and script-independent method was adopted and high performance was obtained. However, this method is sensitive to pen-width variation in practical situation. To solve this problem, an inner and inter class(More)
In a country like India where different scripts are in use, automatic identification of handwritten script facilitates many important applications such as automatic transcription of multilingual documents and for the selection of script specific OCR in a multilingual environment. Existing script identification techniques depend on various features extracted(More)
As a cursive script, the characteristics of Arabic texts are different from those of Latin or Chinese greatly. For example, an Arabic character has up to four written forms and characters that can be joined are always connected on the baseline. Therefore, the methods used for Arabic document recognition are different from those for Latin and Chinese, where(More)
This paper demonstrates the research work on multilingual document recognition technology and its application in China, which is useful for building multilingual digital library. The multilingual OCR (optical character recognition) key technologies and general system framework are summarized based on the previous research work for Chinese, Japanese, Korean,(More)
We introduce the research of document digitization technology and its applications for constructing digital libraries in China. We focus on two major objectives of document digitization technologies: performance and efficiency. Taking the most representative TH-OCR product as an example, the up-to-date research achievements on both kernel OCR technologies(More)
HMM-based analytical methods have been widely used for Arabic handwriting recognition. A key factor influencing the performance of HMM-based systems is the features extracted from a sliding window. In this paper, we propose a novel baseline-independent feature set extracted from a wider sliding window to directly capture the contextual information. This(More)
Mongolian is one of the most common written languages in China, Mongolia, and Russia. Many printed Mongolian documents still remain to be digitized for digital library applications. The traditional Mongolian script has a unique vertical cursive writing style and multiple font variations, which makes Mongolian Optical Character Recognition challenging. As(More)
Historical Chinese character recognition has been a challenging topic in pattern recognition field because of large character set, various writing styles and lack of training samples. In this paper, we adopted Style Transfer Mapping (STM) method to historical Chinese character recognition. Optimal selection of parameters was discussed. Two sets of(More)