Michael A. Moll

Learn More
The ScatterType CAPTCHA, designed to resist character– segmentation attacks and shown to be highly legible to human readers, is analyzed for vulnerabilities and is offered for experiments in automatic attack. As introduced in [BR05], 'ScatterType' challenges are images of machine-print text whose characters are cut into pieces which then drift apart, in an(More)
We discuss problems in developing policies for ground truthing document images for pixel-accurate segmentation. First, we describe ground truthing policies that apply to four different scales: (1) paragraph, (2) text line, (3) character , and (4) pixel. We then analyze difficult and/or ambiguous cases that will challenge any policy, e.g. blank space,(More)
We describe a methodology for retrieving document images from large extremely diverse collections. First we perform content extraction, that is the location and measurement of regions containing handwriting, machine-printed text, photographs, blank space, etc, in documents represented as bilevel, greylevel, or color images. Recent experiments have shown(More)
A CAPTCHA which humans find to be highly legible and which is designed to resist automatic character–segmentation attacks is described. As first detailed in [BR05], these 'ScatterType' challenges are images of machine-print text whose characters have been pseudorandomly cut into pieces which have then been forced to drift apart. This scattering is designed(More)
We offer a preliminary report on a research program to investigate versatile algorithms for document image content extraction, that is locating regions containing handwriting, machine-print text, graphics, line-art, logos, photographs, noise, etc. To solve this problem in its full generality requires coping with a vast diversity of document and image types.(More)
We report an investigation into strategies, algorithms, and software tools for document image content extraction and inventory, that is, the location and measurement of regions containing handwriting, machine-printed text, photographs, blank space, etc. We have developed automatically trainable methods, adaptable to many kinds of documents represented as(More)
  • 1