Review of automatic document formatting

@inproceedings{Hurst2009ReviewOA,
  title={Review of automatic document formatting},
  author={Nathan Hurst and Wilmot Li and Kim Marriott},
  booktitle={DocEng '09},
  year={2009}
}
We review the literature on automatic document formatting with an emphasis on recent work in the field. One common way to frame document formatting is as a constrained optimization problem where decision variables encode element placement, constraints enforce required geometric relationships, and the objective function measures layout quality. We present existing research using this framework, describing the kind of optimization problem being solved and the basic optimization techniques used to… 
Automatic table layout and formatting
TLDR
A Table Drawing Tool prototype is developed which implements an automated solution for the table layout optimization problem using a mathematical modelling method and presents three models of the problem for tables with spanning cells and inner tables.
Balancing font sizes for flexibility in automated document layout
TLDR
This paper presents an improved approach for automatically laying out content onto a document page, where the number and size of the items are unknown in advance and an analytical approximation for text placement is presented, refined by using curve fitting over TeX-generated data.
Automated layout preservation in cross language translation of document: an integrated approach and implementation
TLDR
This paper proposes an integrated approach to solve various problems that arise during the process of translation pertaining to the layout of document like content flow, table of content, maintaining relative position and aesthetics of content.
Using ancestral layout models for document digitization
In this article, we show how some concepts found in traditional and old layout practices used to layout text (ruling, grid) can improve document digitization. We will first present these basic layout
Optimal automatic table layout
TLDR
This work presents three different approaches to finding the minimum height layout based on standard approaches for combinatorial optimization, an A*-based approach that uses an admissible heuristic based on the area of the cell content, and a hybrid CP/SAT approach, lazy clause generation, that uses learning to reduce the search required.
DPLfw: a framework for variable content document generation
TLDR
This paper defines the DPLfw architecture, and illustrates its use in the definition of variable-content emergency plans, an implementation of the Document Product Lines (DPL) approach which was defined with the aim of supporting variable content document generation from a domain-oriented point of view.
Optimal pagination and content mapping for customized magazines
TLDR
The algorithm is able to find the optimal number of pages to hold the content, selecting the best templates to be used in the magazine in such a way that all pages are optimally used.
Automatic Minimal-Height Table Layout
TLDR
This work investigates the modelling decisions involved in formulating this problem for use with standard combinatorial optimization techniques that are guaranteed to find the minimal-height table and provides a detailed empirical evaluation of the resulting models using mixed integer programming and constraint programming with lazy clause generation.
Truncation: all the news that fits we'll print
TLDR
A new semantic-focused approach to rate the quality of a truncation point is presented, which shows that semantic-based modeling is critical for high-quality automated document synthesis within a real-world context and over-cut content is shown.
XML-based Variable Data Publishing System with Dynamic Editing and Formatting Function
TLDR
A variable data publishing system with dynamic editing and formatting function, which support fast formatting upon user`s request for large volume documents as well as for template editing through interaction by displaying the result of template-based variable documents on WYSIWYG screen is proposed.
...
...

References

SHOWING 1-10 OF 82 REFERENCES
Two algorithms for automatic document page layout
TLDR
Two approaches to the problem of automatically placing document items on pages of some output device work on different input data according to the application, and try to preserve the reading order provided by the input and use all available area on the page.
Extensible layout in functional documents
TLDR
The Document Description Framework incorporates a model for declarative document layout and processing where documents are treated as functional programs and a variable and reference mechanism is included for resolving rendering interdependency and supporting component reuse.
Constraint-based document layout for the Web
TLDR
A prototype constraint-based Web authoring system and viewing tool that provides linear arithmetic constraints for specifies the layout of the document as well as finite-domain constraints for specifying font size relationships is described.
Automatic float placement in multi-column documents
TLDR
It is found that one of the A* based approaches is faster than the dynamic programming approach and, if a "window" of optimization is used, fast enough for moderately sized documents.
Setting tables and illustrations with style
TLDR
This thesis addresses the problem of formatting complex documents with electronic tools and introduces the concept of graphical style, a way of maintaining consistency in a document, to extend the more traditional notion of document style to illustrations.
Resolving layout interdependency with presentational variables
TLDR
An approach for XML-described layouts based on a post-rendering set of single-assignment variables, analagous to XSLT, that can make this much easier, does not compromise layout extensibility and can be a target for automated interdependency analysis and generation is presented.
Towards Constructive Text, Diagram, and Layout Generation for Information Presentation
TLDR
It is demonstrated that layout offers a rich resource for achieving presentational coherence, alongside more traditional resources such as text-formatting and the text-internal marking of discourse connections, and an integrated approach to layout, text, and diagram generation is introduced.
Toward tighter tables
TLDR
This work presents two new independently-applicable techniques for table layout and investigates two hybrid approaches both of which use iterative column widening to improve the quality of an initial solution found using a different technique.
Constrained XSL formatting objects for adaptive documents
TLDR
This paper describes a new approach to solve the pagination problem of XSL:FO documents where space use efficiency and aesthetic aspects are considered and shows its effectiveness in the generation of personalized welcome letters.
Formatting documents with floats A new algorithm for L A T E X2
TLDR
This paper describes an approach to placement of floats in multicolumn documents and typeset all but the present page of the paper using a version of LATEX that incorporates a prototype implementation of the author's new algorithm.
...
...