• Publications
  • Influence
VARD2 : a tool for dealing with spelling variation in historical corpora
TLDR
We present an overview of the VARD tool, our proposed solution to this problem, which facilitates pre-processing of historical corpus data by inserting modern equivalents alongside historical spelling variants. Expand
  • 133
  • 20
  • PDF
Tagging the Bard: Evaluating the accuracy of a modern POS tagger on Early Modern English corpora
TLDR
In this paper we focus on automatic part-of-speech (POS) annotation, in the context of historical English texts. Expand
  • 68
  • 9
  • PDF
Automatic standardisation of texts containing spelling variation: How much training data do you need?
TLDR
This paper evaluates the performance of VARD 2’s automatic standardisation in terms of precision and recall. Expand
  • 32
  • 6
  • PDF
Panning for gold: Automatically analysing online social engineering attack surfaces
TLDR
We demonstrate that it is possible to automatically identify employees of an organisation using only information which is visible to a remote attacker as a member of the public. Expand
  • 30
  • 2
  • PDF
Guidelines for normalising Early Modern English corpora: Decisions and justifications
TLDR
We propose a number of guidelines for normalising corpora and show how these guidelines have been applied in the Corpus of English Dialogues. Expand
  • 35
  • 2
  • PDF
Tagging Historical Corpora - the problem of spelling variation
TLDR
This paper proposes a corpus pre-processor for detecting historical spelling variants and inserting modern equivalents alongside them. Expand
  • 22
  • 2
  • PDF
Metaphor, Popular Science, and Semantic Tagging: Distant Reading with the Historical Thesaurus of English
TLDR
The use of metaphor in popular science is widespread to aid readers’ conceptions of the scientific concepts under discussion. Expand
  • 7
  • 2
  • PDF
Customising geoparsing and georeferencing for historical texts
TLDR
In order to better support the text mining of historical texts, we propose a combination of complementary techniques from Geographical Information Systems, computational and corpus linguistics. Expand
  • 26
  • 1
  • PDF
Automatically Analyzing Large Texts in a GIS Environment: The Registrar General's Reports and Cholera in the 19th Century
TLDR
The aim of this article is to present new research showcasing how Geographic Information Systems in combination with Natural Language Processing and Corpus Linguistics methods can offer innovative venues of research to analyze large textual collections in the Humanities, particularly in historical research. Expand
  • 21
  • 1