Iuliu Vasile Konya

Learn More
Print media collections of considerable size are held by cultural heritage organizations and will soon be subject to digitization activities. However, technical content quality management in digitization workflows strongly relies on human monitoring. This heavy human intervention is cost intensive and time consuming, which makes automization mandatory. In(More)
Scanned document images are nowadays becoming available in increasingly higher resolutions. Meanwhile, the variations in image quality within typical document collections increase due to images coming from different scan service providers, time periods or digitization methods. Bi-narization is a crucial first step for many document analysis algorithms.(More)
Reliable and generic methods for skew detection are a necessity for any large-scale digitization projects. As one of the first processing steps, skew detection and correction has a heavy influence on all further document analysis modules, such as geometric and logical layout analysis. This paper introduces a generic, scale-independent algorithm capable of(More)
Large-scale digitization projects aimed at periodicals often have as input streams of completely unlabeled document images. In such situations, the results produced by the automatic segmentation of the document stream into issues heavily influence the overall output quality of a document image analysis system. As a solution to the issue segmentation(More)
This work introduces a practical method for performing logical layout analysis on heterogeneous periodical collections. The described module is incorporated into the Fraunhofer document image understanding system and has been successfully used as part of mass digitization projects on more than 500 000 scanned pages. Our primary target are documents with(More)
An ever-growing amount of digitized content urges libraries and archives to integrate new media types from a large number of origins such as publishers, record labels and film archives, into their existing collections. This is a challenging task, since the multimedia content itself as well as the associated metadata is inherently heterogeneous—the different(More)