Mapping and displaying structural transformations between XML and PDF

@inproceedings{Hardy2002MappingAD,
  title={Mapping and displaying structural transformations between XML and PDF},
  author={Matthew R. B. Hardy and David F. Brailsford},
  booktitle={DocEng '02},
  year={2002}
}
Documents are often marked up in XML-based tagsets to delineate major structural components such as headings, paragraphs, figure captions and so on, without much regard to their eventual displayed appearance. And yet these same abstract documents, after many transformations and 'typesetting' processes, often emerge in the popular format of Adobe PDF, either for dissemination or archiving.Until recently PDF has been a totally display-based document representation, relying on the underlying… 

Figures and Tables from this paper

Enhancing composite digital documents using XML-based standoff markup
TLDR
A composite document approach wherein an XML-based document representation is linked via a 'shadow tree' of bi-directional pointers to a PDF representation of the same document, thereby enabling the treatment of specialist material via standard tools working within the XML representation.
Enhancing c omposite Digital Doc uments Using XML-bas ed Stand off Markup
TLDR
This work presents a composite document approach wherein an XMLbased document representation is linked via a ‘shadow tree’ of bi-directional pointers to a PDF representation of the same document.
The Mars project: PDF in XML
TLDR
The Mars document format is based on the fundamental structures of PDF, but uses an XML syntax to represent the document, as well as incorporating additional industry standards such as SVG, PNG, JPG,JPG2000 and OpenType.
TRACKING SUB-PAGE COMPONENTS WITHIN DOCUMENT WORKFLOWS
TLDR
A collection of tools that allow information about the various transformations to be embedded at each stage in the workflow, together with a visualization tool that uses this embedded information to display the relationships between the various intermediate documents.
Tracking sub-page components in document workflows
TLDR
A collection of tools that allow information about the various transformations to be embedded at each stage in the workflow, together with a visualization tool that uses this embedded information to display the relationships between the various intermediate documents.
Creating structured PDF files using XML templates
TLDR
This paper describes a tool for recombining the logical structure from an XML document with the typeset appearance of the corresponding PDF document thereby creating a Structured/Tagged PDF.
Lessons from the dragon: compiling PDF to machine code
TLDR
It is shown that it is possible to compile a page description directly into machine code, bypassing the need to interpret the page description, which can bring a speed increase in PDF rendering and could also help increase document accessibility.
A Practical Method for Compatibility Evaluation of Portable Document Formats
TLDR
A method for verification of PDF documents for compatibility with publication models provided by scientific publishers and demonstrates the degree of document compatibility with the model along with a report of errors and warning messages.
Development of the XML Digital Library from the Parliament of Andalucía for Intelligent Structured Retrieval
TLDR
The development of the XML digital library in Spanish from official documents published by Parliament of Andalucia is described to allow the users of the regional chamber's website to make the most of the interesting advantages given by the structured Information Retrieval.
Automatic Generation of Printed Representations of Ecuadorian Electronic Invoices through XML Data Binding
TLDR
Improvements are shown not only in the generation time of printed electronic invoices but also in more robust and secure mechanisms for handling electronic vouchers through their representations in XML format.
...
1
2
...

References

SHOWING 1-10 OF 16 REFERENCES
A two-view document editor with user-definable document structure
TLDR
This thesis describes a two-view document editor, Lilac, that demonstrates such a design and its practicality on today's workstation computers and three major problems are addressed: language design, WYSIWYG editor design, and implementation for real-time performance.
Document analysis of PDF files: methods, results and implications
SUMMARY A strategy for document analysis is presented which uses Portable Document Format (PDF — the underlying file structure for Adobe Acrobat software) as its starting point. This strategy
Quill: an extensible system for editing documents of mixed type
TLDR
A rigorous specification of the shell/editor interface enables additional editors to be added to the Quill system without affecting the existing editors.
Journal Publishing with Acrobat: the CAJUN Project
TLDR
The paper describes the CAJUN project1 (CD-ROM Acrobat Journals Using Networks) project's progress so far and gives a brief assessment of PDF's suitability as a universal document interchange standard.
The Document Object Model (DOM)
TLDR
This chapter introduces DOM programming as a technique for achieving this objective by manipulating the data and structure in an XML document.
JANUS: An Interactive Document Formatter Based on Declarative Tags
TLDR
The architecture of an experimental document composition system named JANUS, which is intended to support authors of complex documents containing mixtures of text and images, is described.
Separable hyperstructure and delayed link binding
TLDR
The case is made by studying the advantages of program/data separation in computer system architectures and also by re-examining some selected hypermedia systems that have already implemented separability.
Third Edition) version 1
  • Third Edition) version 1
  • 2001
Second Edition) version 1
  • Second Edition) version 1
  • 2000
The Document Object Model (DOM). http://www.w3c.org/TR/2000/REC-DOMLevel- 2-Core
  • The Document Object Model (DOM). http://www.w3c.org/TR/2000/REC-DOMLevel- 2-Core
  • 2000
...
1
2
...