Guy A. Story H. . Jagadish AT T Bell Laboratories 600 ountain Avenue urray Hill, ew Jersey 07974 SA (908) 582-5571; story allegra.att.com

Abstract

Information processing, text and oce systems, oce document architecture (ODA) and interchange format. 10 Figure 4: Percept structures for journal pages various reps allow. The journal issue itself is a percept (i.e. a multirep) that combines all of the page percepts. Constituent percepts and/or reps may be reused to provide other useful online percepts. For example, all of the gures may be collected into a \gure album". In Figure 5 we suggest a number of dierent composite percepts that could be built from the piece-parts of the original journal. The RightPages example illustrates the application of our model to captured data whose structure and \meaning" must be discovered through recognition transformations. The same data model is equally appropriate in a multimedia authoring environment, where the percept concept allows the same exibil-ities and opportunities for data reuse. In this scenario rendering transformations produce display reps and possibly other reps as well. 9 Conclusion Electronic documents in general, and multimedia documents in particular, are likely to have multiple representations of the same information. In this paper, we have developed a data model for managing these multiple representations in a cohesive fashion. This model is compatible with, and easily implemented on top of an object-oriented framework. The model described herein has been used as a basis for the RightPages electronic document management system. 10 Acknowledgements The authors wish to acknowledge Larry O'Gorman and Dave Fox for many helpful discussions. 9 Figure 3: The RightPages interface view re BLOCKINects the page's physical structure; its components are omitted from the gure. Figure 4 shows part of the structure of the percept for the table of contents page, including a number of composite reps created by the recognition transformations. The hardcopy bitmap is created by the scanner (capture device); it is the initial input to recognition and is saved for printing. A lossy tranformation (ltering and subsampling) creates the screen bitmap from the hardcopy bitmap. Regions and subregions of both bitmaps are coupled with other reps, and the coupled reps are available as reps of the constituent percepts that compose the page percept. This allows a constituent percept to be, for example, viewed or printed. An initial recognition transformation segments the hardcopy bitmap into text and graphics and their spatial relationships. Further recognition on text regions isolates characters, blank-delimited strings, and blocks and lines of text. These comprise the physicalrep, which is much like the layout …

5 Figures and Tables

Cite this paper

@inproceedings{Weisz2007GuyAS, title={Guy A. Story H. . Jagadish AT T Bell Laboratories 600 ountain Avenue urray Hill, ew Jersey 07974 SA (908) 582-5571; story allegra.att.com}, author={Henri C. Weisz and Ian R. Campbell-Grant and Roy Hunter and Roy Pierce}, year={2007} }