Shortening the OED: experience with a grammar-defined database

@article{Blake1992ShorteningTO,
  title={Shortening the OED: experience with a grammar-defined database},
  author={G. Elizabeth Blake and Tim Bray and Frank Wm. Tompa},
  journal={ACM Trans. Inf. Syst.},
  year={1992},
  volume={10},
  pages={213-232}
}
Textual databases with highly variable structure can be usefully described by a grammar-defined model. One example of such a text is the Oxford English Dictionary. This paper describes a first attempt to apply technology based on this model to a real problem. A language called GOEDEL, which is a partial implementation of a set of grammar-defined database operators, was used to extract and alter a subset of the OED in order to assist the editors in their production of The Shorter Oxford English… 

Figures and Tables from this paper

Text structure recognition using a region algebra
TLDR
This thesis proposes an efficient batch parsing model and characterize the region algebras to which it applies and proposes an alternative approach based on the type of region algebra that is often used as a query language for text databases.
Grammars++ for Modelling Information in Text
Transformation of structured documents
TLDR
The conclusion was that simple and local transformations can be automatized or semiautomatized, depending whether additional information is not needed, while global transformations are difficult to automatize.
An Algebra for Structured Text Search and a Framework for its Implementation
TLDR
A query algebra is presented that expresses searches on structured text that permits queries that harness document structure and manipulates arbitrary intervals of text, which are recognized in the text from implicit or explicit markup.
Retrieval from hierarchical texts by partial patterns
TLDR
This work describes a query language for retrieving information from collections of hierarchical text based on a tree pattern matching notion called tree inclusion, which allows easy expression of queries that use the structure and the content of the document.
Transformation of Structured Documents with the Use of Grammar
TLDR
The method uses grammars to define both the structure of documents and transformation between structures and its implementation to certain modifications in a syntax-directed document processing system created by the authors.
Structured Document Transformations
TLDR
Alchemist is a transformation language called alchemist which is based on tt grammars which has been extended with semantic actions in order to make it possible to build full scale transformations.
Views of Text
TLDR
Text databases are becoming increasingly important in business applications and some properties of simple document models and text algebras are presented, and these are brieey related to conventional relational systems.
Data Model for Document Transformation and Assembly (Extended Abstract)
TLDR
This paper shows a data model for transforming and assembling document information such as SGML or XML documents that simultaneously provides (1) powerful patterns and contextual conditions, and (2) schema transformation.
A language for queries on structure and contents of textual databases
TLDR
The key idea of the model is that a set-oriented query language based on operations on nearby structure elements of one or more hierarchi es is quite expressive and efficiently implementable, being a good tradeoff between both goals.
...
...

References

SHOWING 1-10 OF 29 REFERENCES
Mind Your Grammar: a New Approach to Modelling Text
TLDR
The grammar-based model presented here builds on the traditional foundations of computer science, and particularly database theory and practice, and uses grammars as schemas and “parsed strings” as instances to create a database model for textdominated database systems.
Programming Languages: Design and Implementation
TLDR
This book explores the major issues in both design and implementation of modern programming languages and provides a basic introduction to the underlying theoretical models on which these languages are based.
Office Document Architecture and Office Document Interchange Formats: Current Status of International Standardization
TLDR
The architectural model, the underlying processing model, and the principles of the interchange formats of the ECMA 101 and ISO drafts are introduced, and possibilities of further development indicated.
SGML handbook
TLDR
This paper introduces generalized markup, a model for generalized markup that automates the very labor-intensive and therefore time-heavy and expensive process of developing and distributing SGML documents.
No Silver Bullet Essence and Accidents of Software Engineering
TLDR
This article shall try to show why there is no single development, in either technology or management technique, that by itself promises even one order-of-magnitude improvement in productivity, in reliability, in simplicity.
Proceedings
s: Keynote voordrachten 9 Abstracts: VK Prijs (voordrachten) 13s: VK Prijs (voordrachten) 13 Abstracts: VK Prijs (postermededelingen) 27s: VK
PAT 3.3 User’s Guzde
  • Centre for the New Oxford English Dictionary,
  • 1988
Document Design with HiTeX: A Step beyond LaTeX
TLDR
This paper analyzes how design specifications can be implemented with LaTeX and presents the concept of a new system called HiTeX, based on a modular document model, which enables parameter-controlled implementations of design specifications for hierarchically nested documents.
Making it short: The Shorter Oxford English Dictionary
Of the early history of the SHORTER O X F O R D ENGLISH DICTIONARY ON HISTORICAL PRINCIPLES little is known beyond the brief facts set out in the preface to the first edition — that specimens were
...
...