Shortening the OED: experience with a grammar-defined database
@article{Blake1992ShorteningTO, title={Shortening the OED: experience with a grammar-defined database}, author={G. Elizabeth Blake and Tim Bray and Frank Wm. Tompa}, journal={ACM Trans. Inf. Syst.}, year={1992}, volume={10}, pages={213-232} }
Textual databases with highly variable structure can be usefully described by a grammar-defined model. One example of such a text is the Oxford English Dictionary. This paper describes a first attempt to apply technology based on this model to a real problem. A language called GOEDEL, which is a partial implementation of a set of grammar-defined database operators, was used to extract and alter a subset of the OED in order to assist the editors in their production of The Shorter Oxford English…
24 Citations
Text structure recognition using a region algebra
- Computer Science
- 2001
This thesis proposes an efficient batch parsing model and characterize the region algebras to which it applies and proposes an alternative approach based on the type of region algebra that is often used as a query language for text databases.
Transformation of structured documents
- Computer Science
- 1996
The conclusion was that simple and local transformations can be automatized or semiautomatized, depending whether additional information is not needed, while global transformations are difficult to automatize.
An Algebra for Structured Text Search and a Framework for its Implementation
- Computer Science, MathematicsComput. J.
- 1995
A query algebra is presented that expresses searches on structured text that permits queries that harness document structure and manipulates arbitrary intervals of text, which are recognized in the text from implicit or explicit markup.
Retrieval from hierarchical texts by partial patterns
- Computer ScienceSIGIR
- 1993
This work describes a query language for retrieving information from collections of hierarchical text based on a tree pattern matching notion called tree inclusion, which allows easy expression of queries that use the structure and the content of the document.
Transformation of Structured Documents with the Use of Grammar
- Computer ScienceElectron. Publ.
- 1993
The method uses grammars to define both the structure of documents and transformation between structures and its implementation to certain modifications in a syntax-directed document processing system created by the authors.
Structured Document Transformations
- Computer Science
- 1997
Alchemist is a transformation language called alchemist which is based on tt grammars which has been extended with semantic actions in order to make it possible to build full scale transformations.
Views of Text
- Computer Science
- 1997
Text databases are becoming increasingly important in business applications and some properties of simple document models and text algebras are presented, and these are brieey related to conventional relational systems.
Data Model for Document Transformation and Assembly (Extended Abstract)
- Computer Science
- 1998
This paper shows a data model for transforming and assembling document information such as SGML or XML documents that simultaneously provides (1) powerful patterns and contextual conditions, and (2) schema transformation.
A language for queries on structure and contents of textual databases
- Computer ScienceSIGIR '95
- 1995
The key idea of the model is that a set-oriented query language based on operations on nearby structure elements of one or more hierarchi es is quite expressive and efficiently implementable, being a good tradeoff between both goals.
References
SHOWING 1-10 OF 29 REFERENCES
Mind Your Grammar: a New Approach to Modelling Text
- Computer ScienceVLDB
- 1987
The grammar-based model presented here builds on the traditional foundations of computer science, and particularly database theory and practice, and uses grammars as schemas and “parsed strings” as instances to create a database model for textdominated database systems.
Programming Languages: Design and Implementation
- Computer Science
- 1975
This book explores the major issues in both design and implementation of modern programming languages and provides a basic introduction to the underlying theoretical models on which these languages are based.
Office Document Architecture and Office Document Interchange Formats: Current Status of International Standardization
- Computer ScienceComputer
- 1985
The architectural model, the underlying processing model, and the principles of the interchange formats of the ECMA 101 and ISO drafts are introduced, and possibilities of further development indicated.
SGML handbook
- Computer Science
- 1990
This paper introduces generalized markup, a model for generalized markup that automates the very labor-intensive and therefore time-heavy and expensive process of developing and distributing SGML documents.
No Silver Bullet Essence and Accidents of Software Engineering
- HistoryComputer
- 1987
This article shall try to show why there is no single development, in either technology or management technique, that by itself promises even one order-of-magnitude improvement in productivity, in reliability, in simplicity.
Proceedings
- 1947
s: Keynote voordrachten 9 Abstracts: VK Prijs (voordrachten) 13s: VK Prijs (voordrachten) 13 Abstracts: VK Prijs (postermededelingen) 27s: VK…
PAT 3.3 User’s Guzde
- Centre for the New Oxford English Dictionary,
- 1988
Document Design with HiTeX: A Step beyond LaTeX
- Computer Science
- 1987
This paper analyzes how design specifications can be implemented with LaTeX and presents the concept of a new system called HiTeX, based on a modular document model, which enables parameter-controlled implementations of design specifications for hierarchically nested documents.
Making it short: The Shorter Oxford English Dictionary
- History
- 1986
Of the early history of the SHORTER O X F O R D ENGLISH DICTIONARY ON HISTORICAL PRINCIPLES little is known beyond the brief facts set out in the preface to the first edition — that specimens were…