Karim Hadjar

Learn More
The aim of layout analysis is to extract the geometric structure from a document image. It consists of labeling homogenous regions of a document image. This paper describes the performance of segmentation algorithms and their adaptation in order to treat complex structured Arabic documents such as newspapers. Experimental tests have been carried out on four(More)
Accessing the structured content of PDF document is a difficult task, requiring pre-processing and reverse engineering techniques. In this paper, we first present different methods to accomplish this task, which are based either on document image analysis, or on electronic content extraction. Then, XCDF, a canonical format with well-defined properties is(More)
PDF became a very common format for exchanging printable documents. Further, it can be easily generated from the major documents formats, which make a huge number of PDF documents available over the net. However its use is limited to displaying and printing, which considerably reduces the search and retrieval capabilities. For this reason, additional tools(More)
This paper describes PLANET, a recognition method to be applied on Arabic documents with complex structures allowing incremental learning in an interactive environment. The classification is driven by artificial neural nets each one being specialized in a document model. The first prototype of PLANET has been tested on five different phases of newspaper(More)
This article presents Xed, a reverse engineering tool for PDF documents, which extracts the original document layout structure. Xed mixes electronic extraction methods with state-of-the-art document analysis techniques and outputs the layout structure in a hierarchical canonical form, i.e. which is universal and independent of the document type. This(More)
This paper describes 2(CREM), a recognition method to be applied on documents with complex structures allowing incremental learning in an interactive environment. The classification is driven by a model, which contains a static as well as a dynamic part and evolves by use. The first prototype of 2(CREM) has been tested on four different phases of newspaper(More)
In this paper we propose a new way of writing web reviews. The problem lies in the static structure of web reviews. Traditionally speaking, a site administrator or user writes up a review and enters his scores onto a review website. Once the content is ready to be publicly viewed the database just retrieves the information and displays it as it is. This(More)