The Archaeotools project: faceted classification and natural language processing in an archaeological context

  title={The Archaeotools project: faceted classification and natural language processing in an archaeological context},
  author={Stuart Jeffrey and Joanna C. Richards and Fabio Ciravegna and Stewart Waller and Sam Chapman and Z. Zhang},
  journal={Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences},
  pages={2507 - 2519}
  • S. Jeffrey, J. Richards, Z. Zhang
  • Published 28 June 2009
  • Computer Science
  • Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences
This paper describes ‘Archaeotools’, a major e-Science project in archaeology. The aim of the project is to use faceted classification and natural language processing to create an advanced infrastructure for archaeological research. The project aims to integrate over 1×106 structured database records referring to archaeological sites and monuments in the UK, with information extracted from semi-structured grey literature reports, and unstructured antiquarian journal accounts, in a single… 

Figures and Tables from this paper

Integrating archaeological literature into resource discovery interfaces using natural language processing and name authority services
This paper provides an overview of a number of the approaches to the integration of such legacy literature into geospatial search mechanisms in an archaeological context via the Archaeotools e-Science project and its use of natural language processing and a geo-spatial cross-walk service.
Text Mining in Archaeology : Extracting Information from Archaeological Reports
The chapter describes the archaeological user needs requirement, drawing examples from several countries, and the authors present examples drawn from their own projects, and previous work by others, of how NLP and IE can contribute to addressing this need.
A pilot investigation of information extraction in the semantic annotation of archaeological reports
Results are reported from an initial evaluation, which suggest that these information extraction techniques can be applied to archaeological grey literature reports, and further work is discussed drawing on the evaluation and consideration of the characteristics of the archaeology domain.
A Minimalist Approach to Archaeological Data Management Design
This work describes a system that maintains a structural separation between recording a simple set of archaeological phenomena, and the functional, behavioral meanings, and temporal associations of these phenomena that are recorded rather than the phenomena upon which these interpretations are based.
Digital Transformation and Archaeology
In the era of digital archaeology, the communication of archaeological data/contexts/work can be enhanced by Cloud computing, AI, and other emergent technologies. The authors explore the most recent
The Digital Index of North American Archaeology: networking government data to navigate an uncertain future for the past
Abstract The ‘Digital Index of North American Archaeology’ (DINAA) project demonstrates how the aggregation and publication of government-held archaeological data can help to document human activity
Archaeology and the Semantic Web
For Archaeology to benefit from semantic technologies would require a severe sociological shift from current practice towards openness and decentralization, and whether such a shift is either desirable or feasible is raised as a topic for future work.
Knowledge-Based Named Entity Recognition of Archaeological Concepts in Dutch
The paper presents the development and evaluation of a Named Entity Recognition system of Dutch archaeological grey literature targeted at extracting mentions of artefacts, archaeological features, materials, places and time entities.
A study of semantic integration across archaeological data and reports in different languages
The study demonstrates the feasibility of connecting information extracted from datasets and grey literature reports in different languages and semantic cross-searching of the integrated information and opens new possibilities for integrative research across diverse resources.
Extracting Information from Archaeological Texts
Abstract To address archaeology’s most pressing substantive challenges, researchers must discover, access, and extract information contained in the reports and articles that codify so much of


Thinking Outside the Search Box : The Common Information Environment and Archaeobrowser
The paper describes how the Archaeology Data Service has tackled some of the problems of searching complex heritage datasets using a standard search box by creating a browser-based demonstrator, the Archaeobrowser, which makes use of facetted classification technology to guide the user through over 1,000,000 records from multiple datasets.
A Chain of Text-mining to Extract Information in Archaeology
A text-mining system used to extract archaeological knowledge from specialized texts using user- friendly tools enabling experts to transfer easily their knowledge and inductive algorithms to reduce the experts' workload.
A Faceted Query Engine Applied to Archaeology
This work has developed a general faceted domain model and a query language for hierarchically classified data that can provide global access to sizeable datasets in queriable format and can serve as a valuable tool for data analysis and research in many application domains.
Bridging the Two Cultures – Commercial Archaeology and the Study of Prehistoric Britain
This paper was given at a meeting of the Society held on 12 January 2006 and it discusses the relationship between academic research and developer-funded archaeology in Britain today, highlighting
Information Extraction
  • M. Pazienza
  • Computer Science
    Lecture Notes in Computer Science
  • 2002
This paper discusses attempts to derive templates directly from corpora; to derive knowledge structures and lexicons directly from Corpora, including discussion of the recent LE project ECRAN which attempted to tune existing lexicons to new corpora.
Stepping back from the trench edge: an archaeological perspective on the devleopment of standards for recording and publication Reuse Items deposited in White Rose Research Online are protected by copyright, with all rights reserved unless indicated otherwise. They may
Semantic annotation for knowledge management: Requirements and a survey of the state of the art
Introduction to Information Extraction Technology
An introduction to pinch technology linhoffmarch, an introduction to information extraction itl nist gov, and a gentle introduction to blockchain technology web.
Finding needles in haystacks
Probes that reversibly recognize oligohistidine sequences engineered into proteins can provide insights into the structure and location of proteins in living cells.
Bridging the two cultures
A Sense of the Future.By J. Bronowski. Pp. 286. (Massachusetts Institute of Technology Press: Boston, Massachusetts and London, 1977.) $12.50; £7.25.